Institute of Computational Linguistics (ICL), Peking University is an interdisciplinary institute of science and liberal arts, it focuses primarily on the fundamental researches and applications of language information processing. The research of ICL covers a wide range of areas, including Chinese syntax, language parsing, computational lexicography, semantic dictionaries, computational semantics and application systems.
Professor X is working for ICL. His little daughter Jane is 9 years old and has learned something about programming. She is always very interested in her daddy‘s research. During this summer vacation, she took a free programming and algorithm course for kids provided by the School of EECS, Peking University. When the course was finished, she said to Professor X: "Daddy, I just learned a lot of fancy algorithms. Now I can help you! Please give me something to research on!" Professor X laughed and said:"Ok, let‘s start from a simple job. I will give you a lot of text, you should tell me which phrase is most frequently used in the text."
Please help Jane to write a program to do the job.
计算机语言学研究所(ICL),是北大文理结合、多学科交叉,且致力于语言信息处理基础研究与应用的研究所。ICL的研究领域广泛,包括汉语语法、语义分析、计算词典学、语义词典、计算机语义学还有应用系统。
X教授在ICL工作。他有个9岁小的女儿略懂编程。女儿对他爸爸的研究很感兴趣。此次暑假,她参加了有北大信息学院带给孩子们的免费算法编程课。学毕,她对X教授说:“老爸,我已经学了很多很神奇的算法。现在我能帮你忙了!分给我一点你的研究吧!”X教授高兴地回答:“好,那我们就先从简单的开始。我会给你很多的句子,你要告诉我那个词组最常被使用。”
帮Jane敲个程序搞定这件事吧。
There are no more than 20 test cases.
In each case, there are one or more lines of text ended by a line of "####". The text includes words, spaces, ‘,‘s and ‘.‘s. A word consists of only lowercase letters. Two adjacent words make a "phrase". Two words which there are just one or more spaces between them are considered adjacent. No word is split across two lines and two words which belong to different lines can‘t form a phrase. Two phrases which the only difference between them is the number of spaces, are considered the same.
Please note that the maximum length of a line is 500 characters, and there are at most 50 lines in a test case. It‘s guaranteed that there are at least 1 phrase in each test case.
测试用例不超过20组。 其中若干行文本均以"####"结束输入。每个文本包括若干单词,空格,‘,‘s 与 ‘.‘。单词只由小写字母组成。两个相邻的单词即为一个“词组”。被一个或多个空格分隔的单词也是相邻的。没有任何单词被分割为两行,并且不同行的单词不能组成词组。两个词组间只有空格数量不同,则这两个词组是相同的。 注意,每行的最大长度为500个字符,每组测试用例至少有50行。每组测试用例至少有1个词组。
For each test case, print the most frequently used phrase and the number of times it appears, separated by a ‘:‘ . If there are more than one choice, print the one which has the smallest dictionary order. Please note that if there are more than one spaces between the two words of a phrase, just keep one space.
对于每个测试用例,输出最常使用的词组与其出现次数,用一个‘:‘隔开。如果存在多解,输出字典序最小的答案。注意,若单词间存在多个空格,只保留一个空格。
Sample Input - 样例输入
above,all ,above all good at good at good at good at above all me this is #### world hello ok ####
Sample Output - 样例输出
at good:3 hello ok:1
【题解】
刚刚看完题目的时候脑子有点激动,第一反应想来一发AC自动机。然而看了看提交数量有点……不科学。
然后……额,好像直接用map来储存和维护就行了。
WA点,在非严格初始化下,一定要处理好每个读取的字符串的结束标志,不然可能被之前的数据影响,比如下面这组数据。
b b c c b ####
【代码 C++】
1 #include <cstdio> 2 #include <string> 3 #include <map> 4 std::map<std::string, int> data; 5 std::string opt; 6 int maxn; 7 bool slove(){ 8 opt.clear(); data.clear(); maxn = 0; 9 char rd[505], *i, *j; 10 bool isRD = 0, lst = 0; 11 int n; 12 std::string temp; 13 while (gets(rd)){ 14 isRD = 1; temp.clear(); 15 if (rd[0] == ‘#‘) break; 16 for (i = rd; *i == ‘ ‘; ++i); 17 for (; *i; ++i){ 18 if (*i == ‘,‘ || *i == ‘.‘){ 19 for (++i; *i == ‘ ‘; ++i); 20 temp.clear(); --i; 21 } 22 else{ 23 temp += *i; 24 if (*i != ‘ ‘) continue; 25 for (++i; *i == ‘ ‘; ++i); 26 for (j = i; ‘a‘ <= *j && *j <= ‘z‘; ++j) temp += *j; 27 if (i == j){ temp.clear(); --i; continue; } 28 data[temp] = (n = data[temp] + 1); 29 if (n > maxn || (n == maxn && temp < opt)) opt = temp, maxn = n; 30 temp.clear(); 31 while (i < j) temp += *i++; 32 --i; 33 } 34 } 35 } 36 return isRD; 37 } 38 int main(){ 39 while (slove()){ 40 printf("%s:%d\n", opt.c_str(), maxn); 41 } 42 return 0; 43 }
hihoCoder 1385 : A Simple Job(简单工作)
原文:http://www.cnblogs.com/Simon-X/p/5926727.html