连续几天做实验,有种一波未平一波又起的赶脚,今天也是,想想还是记录一下下吧。
首先,跑GPS(Graph Processing System)的时候,因为输入文件增大导致昨天运行正常的流程出问题,显示heap size~~~~。所以毛病锁定在输入规模上面!也就是所谓的scalablity issue。由于GPS的资料非常的少,基本没有。我大概搜了一下,有人说要增大堆栈空间,巴拉巴拉,都不好使(http://stackoverflow.com/questions/1596009/java-lang-outofmemoryerror-java-heap-space)。后来去了GPS的讨论组,GPS之父告诉我们,应该这样做:
参见:https://groups.google.com/forum/#!topic/stanfordgpsusers/62FeHpZijU0
Hi, Semih.
I am using GPS to process twitter graph. I got a critical problem. Twitter has a very skewed graph. Some users may have more than 100k followers. Such vertex will trigger huge number of messages.
GPS generates "java heap overflow" exceptions. I thought that was because the messages are buffered in the memory before sending out.
easy_install
rdflib来安装的时候总是显示有错误(可能这几天网络有问题,梯子不够长,嘿嘿)。后来急了,直接下载源文件,本地手动安装!可是找了半天,居然没有找到怎么手动安装!!
剁手~~~其实,进去文件目录后,prthon
setup.py install就可以了。No handlers could be found for logger "rdflib.term"
import logging import rdflib logging.basicConfig()# now load your graph g = rdflib.Graph() g.load("life_the_universe_everything.rdf")
跑rdflib的时候,遇到问题:
WARNING:rdflib.term:http://www.w3.org/1999/02/22-rdf-syntax-ns# first does not look like a valid URI, trying to serialize this will break.
然后会直接导致不能运行,郁闷啊!况且我的代码怎么能有Warning!于是乎去改,搜索了一下啊,有人说把URL里面的空格岁百纳用什么替代就好了,瞬间就笑开颜了,哈哈,果然好用!
当然了,我这么操作是因为我不在意具体的URL是什么,我只是把它当作一串字符而已!
原文:http://www.cnblogs.com/xubenben/p/3766343.html