?
首先看到dubbo-claim应用突然大面积报错,基本反馈是冻结预算出问题了,看了看冻结预算的代码,发现写的非常复杂,果断放弃看代码来排查问题。
Could not open jdbc connection for transaction: Pool empty. Unable to fetch a connection in 10 seconds, none available[size:20,busy:20; idle:0, lastwait:10000],
报错中能获得的信息是数据库的连接池满了。因为是突然报错,所以排查的主要两个方向:
其他可能的小概率事件:
接下来就需要验证猜测,第一反应想看看jvm里的线程都干嘛呢,到底是谁在拖后腿,用 jstack 来打印出线程的活动状态,我是用jvisualVM来直接 dump 线程。
接下来就是看日志中的线程都干嘛呢,首先看状态为 BLOCK,和 WAITING 状态的线程,看他们是因为什么导致不执行。
"DubboServerHandler-10.3.0.87:20888-thread-152" - Thread t@229
java.lang.Thread.State: BLOCKED
at com.nfsq.xs.claim.biz.budget.BudgetService.freezeBudget(BudgetService.java:85)
- waiting to lock <7153f861> (a com.nfsq.xs.claim.biz.budget.BudgetService) owned by "DubboServerHandler-10.3.0.87:20888-thread-176" t@253
at com.nfsq.xs.claim.biz.budget.BudgetService$$FastClassBySpringCGLIB$$8dee689c.invoke(<generated>)
at org.springframework.cglib.proxy.MethodProxy.invoke(MethodProxy.java:204)
接下来发现日志里一个非常重要信息线程152在等线程176的一个锁(7153f861),且其他很多线程也是在等待这把锁 。
接下来看线程176这家伙拿着这把锁干嘛呢
"DubboServerHandler-10.3.0.87:20888-thread-176" - Thread t@253
java.lang.Thread.State: RUNNABLE
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
at java.net.SocketInputStream.read(SocketInputStream.java:170)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
at com.mysql.jdbc.util.ReadAheadInputStream.fill(ReadAheadInputStream.java:114)
at com.mysql.jdbc.util.ReadAheadInputStream.readFromUnderlyingStreamIfNecessary(ReadAheadInputStream.java:161)
at com.mysql.jdbc.util.ReadAheadInputStream.read(ReadAheadInputStream.java:189)
- locked <776bc7e0> (a com.mysql.jdbc.util.ReadAheadInputStream)
at com.mysql.jdbc.MysqlIO.readFully(MysqlIO.java:3014)
at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:3467)
at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:3456)
at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3997)
at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2468)
at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2629)
at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2719)
- locked <54dd815d> (a com.mysql.jdbc.JDBC4Connection)
at com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:2155)
- locked <54dd815d> (a com.mysql.jdbc.JDBC4Connection)
at com.mysql.jdbc.PreparedStatement.execute(PreparedStatement.java:1379)
- locked <54dd815d> (a com.mysql.jdbc.JDBC4Connection)
at org.apache.ibatis.executor.statement.PreparedStatementHandler.query(PreparedStatementHandler.java:56)
at org.apache.ibatis.executor.statement.RoutingStatementHandler.query(RoutingStatementHandler.java:70)
at org.apache.ibatis.executor.SimpleExecutor.doQuery(SimpleExecutor.java:57)
at org.apache.ibatis.executor.BaseExecutor.queryFromDatabase(BaseExecutor.java:267)
at org.apache.ibatis.executor.BaseExecutor.query(BaseExecutor.java:141)
at org.apache.ibatis.executor.CachingExecutor.query(CachingExecutor.java:105)
at org.apache.ibatis.executor.CachingExecutor.query(CachingExecutor.java:81)
at org.apache.ibatis.session.defaults.DefaultSqlSession.selectList(DefaultSqlSession.java:101)
at org.apache.ibatis.session.defaults.DefaultSqlSession.selectList(DefaultSqlSession.java:95)
at sun.reflect.GeneratedMethodAccessor215.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.mybatis.spring.SqlSessionTemplate$SqlSessionInterceptor.invoke(SqlSessionTemplate.java:355)
at com.sun.proxy.$Proxy21.selectList(Unknown Source)
at org.mybatis.spring.SqlSessionTemplate.selectList(SqlSessionTemplate.java:195)
at org.apache.ibatis.binding.MapperMethod.executeForMany(MapperMethod.java:124)
at org.apache.ibatis.binding.MapperMethod.execute(MapperMethod.java:90)
at org.apache.ibatis.binding.MapperProxy.invoke(MapperProxy.java:40)
at com.sun.proxy.$Proxy32.listActivityWalletDTO(Unknown Source)
at com.nfsq.xs.claim.service.budget.ActivityWalletOperationService.listActivityWalletDTO(ActivityWalletOperationService.java:363)
at com.nfsq.xs.claim.biz.budget.BudgetService.signContract(BudgetService.java:610)
at com.nfsq.xs.claim.biz.budget.BudgetService.freezeBudget(BudgetService.java:95)
- locked <7153f861> (a com.nfsq.xs.claim.biz.budget.BudgetService)
看到了线程176拿了锁后最后一步执行的操作
ActivityWalletOperationService.listActivityWalletDTO
去看了代码发现对应的 SQL:
select * from ACTIVITY_WALLET
where
source_id = ‘240‘
and source = ‘1‘;
这么简单的 SQL ,去数据看了数据量28万条,一看索引竟然只有 id 索引,source_id 查询走的全表扫描。找少炳来加了索引后,应用报错全部都没了,持续观察了两天问题解决。
最后在这里感谢@张少炳,@赵统国 @凡小梦 @马瑾 提供的帮助。
原文:http://370798490.iteye.com/blog/2288955