首页 > 编程语言 > 详细

java程序启动时cpu和负载高探索

时间:2015-02-12 20:11:10      阅读:390      评论:0      收藏:0      [点我收藏+]

这两天协助运维定位1个监控程序CPU占用率达到150%的问题,过程曲折,结论简单,很有意思:)

首先我们来看一下cpu高时候截图:

技术分享

可以看到红色框中的监控程序CPU占用率都很高,但其实这些监控程序的实现很简单:发送1个http请求,收到响应后简单判断一下响应码,然后打印监控结果,打印完成就退出了。每次监控都会重新由daemon程序拉起运行。

这么简单的业务占用这么高的cpu,怎么感觉都不太可能,于是拿到监控程序的源码开始定位。


第一个想到的是VisualVm、JConsole等工具,但由于程序很快就运行完成了,这两个工具都还没有连接上就结束了,而且拿到的数据也没法看出具体是什么原因导致cpu高,尝试了一下就放弃了。


第二个尝试是用strace去跟踪程序的调用,结果摘录如下(省略很多)

17572 14:08:43.485878 futex(0x58049028, FUTEX_WAKE_PRIVATE, 1) = 0
17572 14:08:43.486132 clock_gettime(CLOCK_REALTIME, {1423721323, 486165000}) = 0
17572 14:08:43.486204 futex(0x580497a4, FUTEX_WAIT_PRIVATE, 1, {0, 9949000}) = -1 ETIMEDOUT (Connection timed out)
17572 14:08:43.496964 futex(0x58049028, FUTEX_WAKE_PRIVATE, 1) = 0
17572 14:08:43.497142 clock_gettime(CLOCK_REALTIME, {1423721323, 497169000}) = 0
17572 14:08:43.497236 futex(0x580497a4, FUTEX_WAIT_PRIVATE, 1, {0, 9935000}) = -1 ETIMEDOUT (Connection timed out)
17572 14:08:43.507896 futex(0x58049028, FUTEX_WAKE_PRIVATE, 1) = 0
17572 14:08:43.508164 clock_gettime(CLOCK_REALTIME, {1423721323, 508201000}) = 0
17572 14:08:43.508240 futex(0x580497a4, FUTEX_WAIT_PRIVATE, 1, {0, 9945000}) = -1 ETIMEDOUT (Connection timed out)
17572 14:08:43.518960 futex(0x58049028, FUTEX_WAKE_PRIVATE, 1) = 0
17572 14:08:43.519140 clock_gettime(CLOCK_REALTIME, {1423721323, 519168000}) = 0
17572 14:08:43.519236 futex(0x580497a4, FUTEX_WAIT_PRIVATE, 1, {0, 9934000}) = -1 ETIMEDOUT (Connection timed out)
17572 14:08:43.529882 futex(0x58049028, FUTEX_WAKE_PRIVATE, 1) = 0
17572 14:08:43.530144 clock_gettime(CLOCK_REALTIME, {1423721323, 530177000}) = 0
17572 14:08:43.530216 futex(0x580497a4, FUTEX_WAIT_PRIVATE, 1, {0, 9946000}) = -1 ETIMEDOUT (Connection timed out)
17572 14:08:43.540951 futex(0x58049028, FUTEX_WAKE_PRIVATE, 1) = 0
17572 14:08:43.541129 clock_gettime(CLOCK_REALTIME, {1423721323, 541157000}) = 0
17572 14:08:43.541222 futex(0x580497a4, FUTEX_WAIT_PRIVATE, 1, {0, 9935000}) = -1 ETIMEDOUT (Connection timed out)
17572 14:08:43.551872 futex(0x58049028, FUTEX_WAKE_PRIVATE, 1) = 0
17572 14:08:43.552127 clock_gettime(CLOCK_REALTIME, {1423721323, 552159000}) = 0
17572 14:08:43.552199 futex(0x580497a4, FUTEX_WAIT_PRIVATE, 1, {0, 9949000}) = -1 ETIMEDOUT (Connection timed out)
17572 14:08:43.562857 futex(0x58049028, FUTEX_WAKE_PRIVATE, 1) = 0
17572 14:08:43.563029 clock_gettime(CLOCK_REALTIME, {1423721323, 563056000}) = 0
17572 14:08:43.563119 futex(0x580497a4, FUTEX_WAIT_PRIVATE, 1, {0, 9936000}) = -1 ETIMEDOUT (Connection timed out)
17572 14:08:43.573913 futex(0x58049028, FUTEX_WAKE_PRIVATE, 1) = 0
17572 14:08:43.574159 clock_gettime(CLOCK_REALTIME, {1423721323, 574214000}) = 0
17572 14:08:43.574253 futex(0x580497a4, FUTEX_WAIT_PRIVATE, 1, {0, 9925000}) = -1 ETIMEDOUT (Connection timed out)
17572 14:08:43.584885 futex(0x58049028, FUTEX_WAKE_PRIVATE, 1) = 0
17572 14:08:43.585055 clock_gettime(CLOCK_REALTIME, {1423721323, 585081000}) = 0
17572 14:08:43.585147 futex(0x580497a4, FUTEX_WAIT_PRIVATE, 1, {0, 9936000}) = -1 ETIMEDOUT (Connection timed out)
17572 14:08:43.595900 futex(0x58049028, FUTEX_WAKE_PRIVATE, 1) = 0
17572 14:08:43.596170 clock_gettime(CLOCK_REALTIME, {1423721323, 596206000}) = 0
17572 14:08:43.596245 futex(0x580497a4, FUTEX_WAIT_PRIVATE, 1, {0, 9947000}) = -1 ETIMEDOUT (Connection timed out)
17572 14:08:43.606960 futex(0x58049028, FUTEX_WAKE_PRIVATE, 1) = 0
17572 14:08:43.607139 clock_gettime(CLOCK_REALTIME, {1423721323, 607167000}) = 0
17572 14:08:43.607232 futex(0x580497a4, FUTEX_WAIT_PRIVATE, 1, {0, 9935000}) = -1 ETIMEDOUT (Connection timed out)
17572 14:08:43.617875 futex(0x58049028, FUTEX_WAKE_PRIVATE, 1) = 0
17572 14:08:43.618119 clock_gettime(CLOCK_REALTIME, {1423721323, 618209000}) = 0
17572 14:08:43.618249 futex(0x580497a4, FUTEX_WAIT_PRIVATE, 1, {0, 9890000}) = -1 ETIMEDOUT (Connection timed out)
17572 14:08:43.628960 futex(0x58049028, FUTEX_WAKE_PRIVATE, 1) = 0
17572 14:08:43.629140 clock_gettime(CLOCK_REALTIME, {1423721323, 629168000}) = 0
17572 14:08:43.629231 futex(0x580497a4, FUTEX_WAIT_PRIVATE, 1, {0, 9935000}) = -1 ETIMEDOUT (Connection timed out)
17572 14:08:43.639865 futex(0x58049028, FUTEX_WAKE_PRIVATE, 1) = 0
17572 14:08:43.640134 clock_gettime(CLOCK_REALTIME, {1423721323, 640167000}) = 0
17572 14:08:43.640206 futex(0x580497a4, FUTEX_WAIT_PRIVATE, 1, {0, 9946000}) = -1 ETIMEDOUT (Connection timed out)
17572 14:08:43.650868 futex(0x58049028, FUTEX_WAKE_PRIVATE, 1) = 0
17572 14:08:43.651139 unlink("/tmp/hsperfdata_gamedata/17559") = 0
17572 14:08:43.651324 exit_group(0)     = ?   


怎么那么多futex,google一查“futex connection time out”,嘿,还真有很多结果,最典型的就是leap second,中文翻译为“闰秒”,但仔细一看,上一次闰秒发生是2012年06月份,现在都2015年了,而且天天cpu都高,应该不是这个问题


工具指望不上,只好看代码,代码看了后怀疑几个地方:

1)CountDownLatch:因为这个最有可能用到futex了,但实际想了想和验证了一下,这个东东不可能导致cpu这么高,真有这个问题,这东东完全就没法用了

2)多线程:看了代码,没有几个线程,而且业务就是一发一收,不可能多线程导致的


问题陷入僵局,怎么办呢? 只好用终极大招了:分段注释!

1)注释响应处理的代码 —— 不行,cpu占用100%

2)注释发送请求的代码 —— 还是不行,,cpu占用100%

      这不坑爹嘛,请求都不发,响应也没有了,你咋还这么慢?

3)干脆注释所有代码, 只在main里面打印Hello world —— 还是不行,,cpu占用100%

      这下就蛋疼了,没有任何业务,你cpu还高,这不坑我嘛

4)自己写个Hello world —— 咦,这次可以了,cpu占用2%

     咋回事,同样都是打印Hello world,为啥cpu差别那么大?


左思右想,突然灵光一闪:难道是和jvm加载类文件有关 ?

jvm在启动的时候会装载并连接所有除反射以外的类,而class文件是二进制的文件,需要从磁盘加载到内存然后解析,这种解析是很耗费cpu的,那么class文件越多,cpu耗费就越高,这正好解释了为什么同样输出Hello world,不同程序cpu占用率差别很大。


这个推论正好也解释了之前遇到的另外一个线上项目的现象:每次启动后有大约1分钟左右系统的cpu和负载很高,过了1分钟后就恢复正常了。


【建议】

需要重复运行的监控类程序,如果用java写,最好不要做成每次都要重新启动,而是在程序里面循环或者定时运行;

如果一定要每次都要重新启动,频率又很频繁的话,建议用shell、python之类的来写,否则一台机器运行太多java类的工具程序,会导致cpu和负载飙升


转载请注明出处:http://blog.csdn.net/yunhua_lee/article/details/43765371



java程序启动时cpu和负载高探索

原文:http://blog.csdn.net/yunhua_lee/article/details/43765371

(0)
(0)
   
举报
评论 一句话评论(0
关于我们 - 联系我们 - 留言反馈 - 联系我们:wmxa8@hotmail.com
© 2014 bubuko.com 版权所有
打开技术之扣,分享程序人生!