In 1953, David A. Huffman published his paper "A Method for the Construction of Minimum-Redundancy Codes", and hence printed his name in the history of computer science. As a professor who gives the final exam problem on Huffman codes, I am encountering a big problem: the Huffman codes are NOT unique. For example, given a string "aaaxuaxz", we can observe that the frequencies of the characters ‘a‘, ‘x‘, ‘u‘ and ‘z‘ are 4, 2, 1 and 1, respectively. We may either encode the symbols as {‘a‘=0, ‘x‘=10, ‘u‘=110, ‘z‘=111}, or in another way as {‘a‘=1, ‘x‘=01, ‘u‘=001, ‘z‘=000}, both compress the string into 14 bits. Another set of code can be given as {‘a‘=0, ‘x‘=11, ‘u‘=100, ‘z‘=101}, but {‘a‘=0, ‘x‘=01, ‘u‘=011, ‘z‘=001} is NOT correct since "aaaxuaxz" and "aazuaxax" can both be decoded from the code 00001011001001. The students are submitting all kinds of codes, and I need a computer program to help me determine which ones are correct and which ones are not.
Each input file contains one test case. For each case, the first line gives an integer N (2), then followed by a line that contains all the N distinct characters and their frequencies in the following format:
c[1] f[1] c[2] f[2] ... c[N] f[N]
where c[i]
is a character chosen from {‘0‘ - ‘9‘, ‘a‘ - ‘z‘, ‘A‘ - ‘Z‘, ‘_‘}, and f[i]
is the frequency of c[i]
and is an integer no more than 1000. The next line gives a positive integer M (≤), then followed by M student submissions. Each student submission consists of N lines, each in the format:
c[i] code[i]
where c[i]
is the i
-th character and code[i]
is an non-empty string of no more than 63 ‘0‘s and ‘1‘s.
For each test case, print in each line either "Yes" if the student‘s submission is correct, or "No" if not.
Note: The optimal solution is not necessarily generated by Huffman algorithm. Any prefix code with code length being optimal is considered correct.
7
A 1 B 1 C 1 D 3 E 3 F 6 G 6
4
A 00000
B 00001
C 0001
D 001
E 01
F 10
G 11
A 01010
B 01011
C 0100
D 011
E 10
F 11
G 00
A 000
B 001
C 010
D 011
E 100
F 101
G 110
A 00000
B 00001
C 0001
D 001
E 00
F 10
G 11
Yes Yes No No
虽然题目说是哈夫曼编码,实际只要权值和是最小,编码中不存在某个编码是另一个编码的前缀,且编码长度符合条件(比n小)就可以。
首先用优先队列根据频率计算出最小的权值和,接着对于每个学上提交的编码,用字典树判断是否存在前缀重合的问题,同时根据编码长度和频率计算权值和然后比较,而且判断每个编码长度要合格。
代码:
#include <cstdio> #include <cstring> #include <queue> using namespace std; int n,rc,c,d,num[300],k,flag; char s[2],t[100]; int trie[10000][2],ind,en[10000]; void build(char *str) { int r = 0,i = 0; while(str[i]) { int e = str[i ++] - ‘0‘; if(!trie[r][e]) { trie[r][e] = ++ ind; } r = trie[r][e]; if(en[r] == 1) flag = 1; if(str[i]) en[r] = 2; } if(en[r]) flag = 1; en[r] = 1; } int main() { scanf("%d",&n); priority_queue<int,vector<int>,greater<int> > q; for(int i = 0;i < n;i ++) { scanf("%s %d",s,&d); num[(int)s[0]] = d; q.push(d); } while(q.size() > 1) { d = q.top(); q.pop(); d += q.top(); q.pop(); q.push(d); rc += d; } scanf("%d",&k); for(int i = 0;i < k;i ++) { c = ind = flag = 0; memset(trie,0,sizeof(trie)); memset(en,0,sizeof(en)); for(int j = 0;j < n;j ++) { scanf("%s %s",s,t); int len = strlen(t); if(len >= n) flag = 1; if(flag) continue; build(t); c += num[int(s[0])] * len; } printf("%s\n",c == rc && !flag ? "Yes" : "No"); } }
原文:https://www.cnblogs.com/8023spz/p/12260547.html