题目要求:
Download the text file here. (Right click and save link as).
The goal of this problem is to implement a variant of the 2-SUM algorithm (covered in the Week 6 lecture on hash table applications).
The file contains 1 million integers, both positive and negative (there might be some repetitions!).This is your array of integers, with the ith row of the file specifying the ith entry of the array.
Your task is to compute the number of target values t in the interval [-10000,10000] (inclusive) such that there are distinct numbers x,y in the input file that satisfy x+y=t. (NOTE: ensuring distinctness requires a one-line addition to the algorithm from lecture.)
Write your numeric answer (an integer between 0 and 20001) in the space provided.
OPTIONAL CHALLENGE: If this problem is too easy for you, try implementing your own hash table for it. For example, you could compare performance under the chaining and open addressing approaches to resolving collisions.
大致意思是说,有一百万个数字(正负都有,且可能重复)。目标是要寻找到唯一存在的数对(x,y),使得x+y属于[-10000, 10000]的范围中,最后输出这些数对的个数。
文件中的数据差不多是这样子的:
... -92205919974 -65150999169 -96247986144 70191728682 -59670982771 -10202710755 -72167080349 -19221613985 80341418823 -79433686277 87351713416 6084932343 40801752610 44028247959 84203784346 -28120386474 59064431376 39436465566 -89458683408 -64200776240 -42582579303 -62656888535 55650079602 62954175548 33713639217 70259349593 -97472959477 41890148139 44216212976 26864335558 ...
解题思路:
起初采用了最暴力的方法,直接把这些数字塞进一个hash table中,然后遍历着去查询(这个方法很不好,所以就不详述了),结果运行了很久都没有跑出结果,遂直接放弃这种方法。
这里的问题在于如下两点:
因此针对问题的实际情况进行了些许优化,大致思路如下:
利用上述思路,可以避免大量的不必要的运算,另外我们还可以对其进行进一步的优化:
代码实现:
有了上述的思路,利用Python对其进行了实现,代码如下:
result = {} dic = {} input_file = open("algo1-programming_prob-2sum.txt") # 读入数据 for line in input_file: num = long(line) key = num/10000 if key not in dic: dic[key] = {} # 这里记录数据出现的次数,用来判断该数是否是唯一的 if num in dic[key]: dic[key][num] += 1 else: dic[key][num] = 1 for key1 in dic: # 忽略key1 > 0的情况,减少不必要的重复计算 if key1 > 0: continue # 根据给定的key1,可以推测出key2可取的值 for key2 in range(-key1-2, -key1 + 1): if key2 in dic: # 对key1和key2对应集合中的数进行遍历 for value2 in dic[key2]: for value1 in dic[key1]: # 判断这两个数是否是唯一的 if dic[key1][value1] != 1 or dic[key2][value2] != 1: continue sum_tmp = value1+value2 # 判断两数之和是否在题目要求的范围内 if abs(sum_tmp) <= 10000 and sum_tmp not in result: result[sum] = True # 输出结果 print len(result)
input_file.close()
这段代码在我的电脑上运行仅需要大约3-4秒的时间就能跑出结果(包含读取数据的I/O操作的时间),可见该方法有效地避免了大量不必要的计算。
另外,题目中要求找到唯一存在的数对(x, y),但是如果把代码中判断唯一性的那个条件判断去除,得到的结果也是一样的,这也许和给出的数据有关。
原文:http://www.cnblogs.com/jdneo/p/4729266.html