由于线上系统的要求,需要将jetty的一个抓取页面的fetchurl客户端放弃使用httpclient,如此一来只能使用两种方式 1、使用socket自己组织http协议 2、使用java原生的HttpURLConnection来组织请求。
socket台麻烦了 还是使用方案2吧
fetchurl的服务,是SAE javaruntime提供的一个外链抓取的服务,jetty下是一个封装了api的客户端,主要目的是将用户的fetchurl请求进行特殊处理后发送至fetchurl的服务端,由服务端处理抓取任务,并且返回给客户端,从而完成整个抓取的过程。
fetchurl的客户端支持 GET POST (这两者最常用) 等
Get请求还好说,只需要加上约定的特殊头部,将请求重新定向到服务端即可,难点在于POST请求,主要难点是POST数据的处理分为两点
1、普通的键值对数据
看下httpclient处理的过程,debug日志
[main] DEBUG org.apache.http.impl.conn.DefaultClientConnection - Sending request: POST / HTTP/1.1 [main] DEBUG org.apache.http.wire - >> "POST / HTTP/1.1[\r][\n]" [main] DEBUG org.apache.http.wire - >> "Content-Length: 7[\r][\n]" [main] DEBUG org.apache.http.wire - >> "Content-Type: application/x-www-form-urlencoded; charset=UTF-8[\r][\n]" [main] DEBUG org.apache.http.wire - >> "Host: www.baidu.com[\r][\n]" [main] DEBUG org.apache.http.wire - >> "Connection: Keep-Alive[\r][\n]" [main] DEBUG org.apache.http.wire - >> "User-Agent: Apache-HttpClient/4.1.2 (java 1.5)[\r][\n]" [main] DEBUG org.apache.http.wire - >> "[\r][\n]" [main] DEBUG org.apache.http.headers - >> POST / HTTP/1.1 [main] DEBUG org.apache.http.headers - >> Content-Length: 7 [main] DEBUG org.apache.http.headers - >> Content-Type: application/x-www-form-urlencoded; charset=UTF-8 [main] DEBUG org.apache.http.headers - >> Host: www.baidu.com [main] DEBUG org.apache.http.headers - >> Connection: Keep-Alive [main] DEBUG org.apache.http.headers - >> User-Agent: Apache-HttpClient/4.1.2 (java 1.5) [main] DEBUG org.apache.http.wire - >> "c=d&a=b" <---------------------------这里是实际发送的数据 [main] DEBUG org.apache.http.wire - << "HTTP/1.1 302 Moved Temporarily[\r][\n]" [main] DEBUG org.apache.http.wire - << "Date: Wed, 28 Oct 2015 08:53:54 GMT[\r][\n]" [main] DEBUG org.apache.http.wire - << "Content-Type: text/html[\r][\n]" [main] DEBUG org.apache.http.wire - << "Content-Length: 215[\r][\n]" [main] DEBUG org.apache.http.wire - << "Connection: Keep-Alive[\r][\n]" [main] DEBUG org.apache.http.wire - << "Location: http://www.baidu.com/search/error.html[\r][\n]" [main] DEBUG org.apache.http.wire - << "Server: BWS/1.1[\r][\n]" [main] DEBUG org.apache.http.wire - << "X-UA-Compatible: IE=Edge,chrome=1[\r][\n]" [main] DEBUG org.apache.http.wire - << "BDPAGETYPE: 3[\r][\n]" [main] DEBUG org.apache.http.wire - << "Set-Cookie: BDSVRTM=0; path=/[\r][\n]" [main] DEBUG org.apache.http.wire - << "[\r][\n]" [main] DEBUG org.apache.http.impl.conn.DefaultClientConnection - Receiving response: HTTP/1.1 302 Moved Temporarily [main] DEBUG org.apache.http.headers - << HTTP/1.1 302 Moved Temporarily [main] DEBUG org.apache.http.headers - << Date: Wed, 28 Oct 2015 08:53:54 GMT [main] DEBUG org.apache.http.headers - << Content-Type: text/html [main] DEBUG org.apache.http.headers - << Content-Length: 215 [main] DEBUG org.apache.http.headers - << Connection: Keep-Alive [main] DEBUG org.apache.http.headers - << Location: http://www.baidu.com/search/error.html [main] DEBUG org.apache.http.headers - << Server: BWS/1.1 [main] DEBUG org.apache.http.headers - << X-UA-Compatible: IE=Edge,chrome=1 [main] DEBUG org.apache.http.headers - << BDPAGETYPE: 3 [main] DEBUG org.apache.http.headers - << Set-Cookie: BDSVRTM=0; path=/ [main] DEBUG org.apache.http.client.protocol.ResponseProcessCookies - Cookie accepted: "[version: 0][name: BDSVRTM][value: 0][domain: www.baidu.com][path: /][expiry: null]". [main] DEBUG org.apache.http.impl.client.DefaultHttpClient - Connection can be kept alive indefinitely
从上面的log可以看出来,在头部不需要添加复合头不信息,将键值对组织成图中的格式写出就ok了 处理代码如下:
String boundary = FetchurlUtil.getBoundary(20); Map<String,List<String>> map = conn.getRequestProperties(); //这里的判断是未了区分混合还是非混合型数据 if(postMap == null && binaryList.size()!=0) conn.setRequestProperty("Content-Type", "multipart/form-data; boundary="+boundary); //复合型数据头 conn.connect(); //获取输出流 ds = new DataOutputStream(conn.getOutputStream()); if(postMap != null && binaryList.size()==0) { StringBuffer sb = new StringBuffer(); for(String key : postMap.keySet()){ sb.append(key).append("=").append(postMap.get(key)).append("&"); } String body = sb.substring(0, sb.length()-1); byte[] bytes = body.toString().getBytes(); //直接输出 ds.write(bytes); }
2、混合型数据
混合型的数据除了要加入混合数据的头部以外 还需要自行组织输出的格式,需要一个boundary作为数据的区分,看下httpclient的处理方式:
[main] DEBUG org.apache.http.impl.conn.DefaultClientConnection - Sending request: POST / HTTP/1.1 [main] DEBUG org.apache.http.wire - >> "POST / HTTP/1.1[\r][\n]" [main] DEBUG org.apache.http.wire - >> "Content-Length: 1213[\r][\n]" [main] DEBUG org.apache.http.wire - >> "Content-Type: multipart/form-data; boundary=HRan9S8Z2UndYlgb29r4aPgRjw2yj2rnrpY[\r][\n]" [main] DEBUG org.apache.http.wire - >> "Host: www.baidu.com[\r][\n]" [main] DEBUG org.apache.http.wire - >> "Connection: Keep-Alive[\r][\n]" [main] DEBUG org.apache.http.wire - >> "User-Agent: Apache-HttpClient/4.1.2 (java 1.5)[\r][\n]" [main] DEBUG org.apache.http.wire - >> "[\r][\n]" [main] DEBUG org.apache.http.headers - >> POST / HTTP/1.1 [main] DEBUG org.apache.http.headers - >> Content-Length: 1213 [main] DEBUG org.apache.http.headers - >> Content-Type: multipart/form-data; boundary=HRan9S8Z2UndYlgb29r4aPgRjw2yj2rnrpY [main] DEBUG org.apache.http.headers - >> Host: www.baidu.com [main] DEBUG org.apache.http.headers - >> Connection: Keep-Alive [main] DEBUG org.apache.http.headers - >> User-Agent: Apache-HttpClient/4.1.2 (java 1.5) [main] DEBUG org.apache.http.wire - >> "--" [main] DEBUG org.apache.http.wire - >> "HRan9S8Z2UndYlgb29r4aPgRjw2yj2rnrpY" //boundary [main] DEBUG org.apache.http.wire - >> "[\r][\n]" [main] DEBUG org.apache.http.wire - >> "Content-Disposition" [main] DEBUG org.apache.http.wire - >> ": " [main] DEBUG org.apache.http.wire - >> "form-data; name="aaa"; filename="error_ak.log"" [main] DEBUG org.apache.http.wire - >> "[\r][\n]" [main] DEBUG org.apache.http.wire - >> "Content-Type" [main] DEBUG org.apache.http.wire - >> ": " [main] DEBUG org.apache.http.wire - >> "application/octet-stream" [main] DEBUG org.apache.http.wire - >> "[\r][\n]" [main] DEBUG org.apache.http.wire - >> "Content-Transfer-Encoding" [main] DEBUG org.apache.http.wire - >> ": " [main] DEBUG org.apache.http.wire - >> "binary" [main] DEBUG org.apache.http.wire - >> "[\r][\n]" [main] DEBUG org.apache.http.wire - >> "[\r][\n]" [main] DEBUG org.apache.http.wire - >> "cacacacacacacacacacacacaca" [main] DEBUG org.apache.http.wire - >> "[\r][\n]" [main] DEBUG org.apache.http.wire - >> "--" [main] DEBUG org.apache.http.wire - >> "HRan9S8Z2UndYlgb29r4aPgRjw2yj2rnrpY" [main] DEBUG org.apache.http.wire - >> "[\r][\n]" [main] DEBUG org.apache.http.wire - >> "Content-Disposition" [main] DEBUG org.apache.http.wire - >> ": " [main] DEBUG org.apache.http.wire - >> "form-data; name="bbb"; filename="adminlog.txt"" [main] DEBUG org.apache.http.wire - >> "[\r][\n]" [main] DEBUG org.apache.http.wire - >> "Content-Type" [main] DEBUG org.apache.http.wire - >> ": " [main] DEBUG org.apache.http.wire - >> "application/octet-stream" [main] DEBUG org.apache.http.wire - >> "[\r][\n]" [main] DEBUG org.apache.http.wire - >> "Content-Transfer-Encoding" [main] DEBUG org.apache.http.wire - >> ": " [main] DEBUG org.apache.http.wire - >> "binary" [main] DEBUG org.apache.http.wire - >> "[\r][\n]" [main] DEBUG org.apache.http.wire - >> "[\r][\n]" [main] DEBUG org.apache.http.wire - >> "| 93075 | 0 | system | javavdisk | STARTED | | 2012-08-07 14:33:18 |[\r][\n]" [main] DEBUG org.apache.http.wire - >> "| 93076 | 0 | system | javavdisk | STARTED | | 2012-08-07 14:33:21 |[\r][\n]" [main] DEBUG org.apache.http.wire - >> "| 93077 | 0 | system | javasession | RECYCLED | | 2012-08-07 14:42:59 |[\r][\n]" [main] DEBUG org.apache.http.wire - >> "| 93078 | 0 | system | javasession | STARTED | | 2012-08-07 14:44:55 |" [main] DEBUG org.apache.http.wire - >> "[\r][\n]" [main] DEBUG org.apache.http.wire - >> "--" [main] DEBUG org.apache.http.wire - >> "HRan9S8Z2UndYlgb29r4aPgRjw2yj2rnrpY" [main] DEBUG org.apache.http.wire - >> "[\r][\n]" [main] DEBUG org.apache.http.wire - >> "Content-Disposition" [main] DEBUG org.apache.http.wire - >> ": " [main] DEBUG org.apache.http.wire - >> "form-data; name="sss"" [main] DEBUG org.apache.http.wire - >> "[\r][\n]" [main] DEBUG org.apache.http.wire - >> "Content-Type" [main] DEBUG org.apache.http.wire - >> ": " [main] DEBUG org.apache.http.wire - >> "text/plain; charset=UTF-8" [main] DEBUG org.apache.http.wire - >> "[\r][\n]" [main] DEBUG org.apache.http.wire - >> "Content-Transfer-Encoding" [main] DEBUG org.apache.http.wire - >> ": " [main] DEBUG org.apache.http.wire - >> "8bit" [main] DEBUG org.apache.http.wire - >> "[\r][\n]" [main] DEBUG org.apache.http.wire - >> "[\r][\n]" [main] DEBUG org.apache.http.wire - >> "xxxx" [main] DEBUG org.apache.http.wire - >> "[\r][\n]" [main] DEBUG org.apache.http.wire - >> "--" [main] DEBUG org.apache.http.wire - >> "HRan9S8Z2UndYlgb29r4aPgRjw2yj2rnrpY" [main] DEBUG org.apache.http.wire - >> "[\r][\n]" [main] DEBUG org.apache.http.wire - >> "Content-Disposition" [main] DEBUG org.apache.http.wire - >> ": " [main] DEBUG org.apache.http.wire - >> "form-data; name="jjj"" [main] DEBUG org.apache.http.wire - >> "[\r][\n]" [main] DEBUG org.apache.http.wire - >> "Content-Type" [main] DEBUG org.apache.http.wire - >> ": " [main] DEBUG org.apache.http.wire - >> "text/plain; charset=UTF-8" [main] DEBUG org.apache.http.wire - >> "[\r][\n]" [main] DEBUG org.apache.http.wire - >> "Content-Transfer-Encoding" [main] DEBUG org.apache.http.wire - >> ": " [main] DEBUG org.apache.http.wire - >> "8bit" [main] DEBUG org.apache.http.wire - >> "[\r][\n]" [main] DEBUG org.apache.http.wire - >> "[\r][\n]" [main] DEBUG org.apache.http.wire - >> "nnnn" [main] DEBUG org.apache.http.wire - >> "[\r][\n]" [main] DEBUG org.apache.http.wire - >> "--" [main] DEBUG org.apache.http.wire - >> "HRan9S8Z2UndYlgb29r4aPgRjw2yj2rnrpY" [main] DEBUG org.apache.http.wire - >> "--" [main] DEBUG org.apache.http.wire - >> "[\r][\n]" [main] DEBUG org.apache.http.wire - << "HTTP/1.1 302 Moved Temporarily[\r][\n]" [main] DEBUG org.apache.http.wire - << "Date: Wed, 28 Oct 2015 09:01:19 GMT[\r][\n]" [main] DEBUG org.apache.http.wire - << "Content-Type: text/html[\r][\n]" [main] DEBUG org.apache.http.wire - << "Content-Length: 215[\r][\n]" [main] DEBUG org.apache.http.wire - << "Connection: Keep-Alive[\r][\n]" [main] DEBUG org.apache.http.wire - << "Location: http://www.baidu.com/search/error.html[\r][\n]" [main] DEBUG org.apache.http.wire - << "Server: BWS/1.1[\r][\n]" [main] DEBUG org.apache.http.wire - << "X-UA-Compatible: IE=Edge,chrome=1[\r][\n]" [main] DEBUG org.apache.http.wire - << "BDPAGETYPE: 3[\r][\n]" [main] DEBUG org.apache.http.wire - << "Set-Cookie: BDSVRTM=0; path=/[\r][\n]" [main] DEBUG org.apache.http.wire - << "[\r][\n]" [main] DEBUG org.apache.http.impl.conn.DefaultClientConnection - Receiving response: HTTP/1.1 302 Moved Temporarily [main] DEBUG org.apache.http.headers - << HTTP/1.1 302 Moved Temporarily [main] DEBUG org.apache.http.headers - << Date: Wed, 28 Oct 2015 09:01:19 GMT [main] DEBUG org.apache.http.headers - << Content-Type: text/html [main] DEBUG org.apache.http.headers - << Content-Length: 215 [main] DEBUG org.apache.http.headers - << Connection: Keep-Alive [main] DEBUG org.apache.http.headers - << Location: http://www.baidu.com/search/error.html [main] DEBUG org.apache.http.headers - << Server: BWS/1.1 [main] DEBUG org.apache.http.headers - << X-UA-Compatible: IE=Edge,chrome=1 [main] DEBUG org.apache.http.headers - << BDPAGETYPE: 3 [main] DEBUG org.apache.http.headers - << Set-Cookie: BDSVRTM=0; path=/ [main] DEBUG org.apache.http.client.protocol.ResponseProcessCookies - Cookie accepted: "[version: 0][name: BDSVRTM][value: 0][domain: www.baidu.com][path: /][expiry: null]". [main] DEBUG org.apache.http.impl.client.DefaultHttpClient - Connection can be kept alive indefinitely
按照以上处理的格式可以看出,具体组织body部分的格式是如何 处理代码如下
//生成二进制格式的post数据 if(binaryList.size() > 0) { Iterator<BinaryData> binaryIter = binaryList.iterator(); while(binaryIter.hasNext()){ BinaryData bd = binaryIter.next(); ds.writeBytes("--"+boundary+"\r\n"); ds.writeBytes("Content-Disposition: form-data; name=\"" + bd.getInputParameterName() +"\"; filename=\"" +bd.getFileName() +"\"\r\n" ); ds.writeBytes("\r\n"); ds.writeBytes("Content-Type : application/octet-stream"); ds.writeBytes("\r\n"); ds.writeBytes("Content-Transfer-Encoding : binary"); ds.writeBytes("\r\n"); ds.write(bd.getPostData()); ds.writeBytes("\r\n"); ds.writeBytes("--"+boundary+"\r\n"); } //如果文本类型参数也不为空 if(postMap != null) { for(String key : postMap.keySet()){ ds.writeBytes("--"+boundary+"\r\n"); ds.writeBytes("Content-Disposition: form-data; name=\"" + key +"\"\r\n" ); ds.writeBytes("\r\n"); ds.writeBytes("Content-Type : text/plan;charset=utf-8"); ds.writeBytes("\r\n"); ds.writeBytes("Content-Transfer-Encoding : 8bit"); ds.writeBytes("\r\n"); ds.writeBytes(postMap.get(key)); ds.writeBytes("\r\n"); ds.writeBytes("--"+boundary+"\r\n"); } }
这样就可以将post请求的body部分整理完毕
从日志中可以看到,从对发地址返回的responseCode是 302 意味着跳转,这个问题也是纠结了很久
参看代码:
conn = (HttpURLConnection)_url.openConnection(); conn.setRequestMethod(method); genHttpHeader(conn,url); genAddHttpHeader(conn); if(isPost) { conn.setDoOutput(true);//设置输入参数 conn.setUseCaches(false); // conn.setConnectTimeout(100000); genPostRequest(conn); } if(logger.isDebugEnabled()) { debugRequestHeader(conn); } InputStream is = conn.getInputStream(); //这里拿最后的返回数据 <-----------------------
上面的代码箭头标识的地方,是我每次debug的时候,整个线程都卡死在这里,一直到等到超时,最后抛出超时的异常出来,反反复复没有找到是什么原因,后来抓包如下:
//这是正常的fetchurl的请求,从请求中可以看出,自己组织的头部信息是完整的 0x0000: 4500 01ca 31d2 4000 4006 d4a2 0a43 0f24 E...1.@.@....C.$ 0x0010: 0a43 0f10 9fbb 0050 d061 3f96 ee54 acb8 .C.....P.a?..T.. 0x0020: 5018 0073 3476 0000 504f 5354 202f 2048 P..s4v..POST./.H //协议 0x0030: 5454 502f 312e 310d 0a46 6574 6368 5572 TTP/1.1..FetchUr //特殊的头 0x0040: 6c3a 2077 7777 2e62 6169 6475 2e63 6f6d l:.www.baidu.com 0x0050: 0d0a 4163 6365 7373 4b65 793a 206e 3434 ..AccessKey:.n44 0x0060: 6d6b 6f6f 7878 6a0d 0a54 696d 6553 7461 mkooxxj..TimeSta 0x0070: 6d70 3a20 3134 3436 3030 3035 3334 0d0a mp:.1446000534.. 0x0080: 5369 676e 6174 7572 653a 2034 3576 3836 Signature:.45v86 0x0090: 6568 3073 637a 635a 5678 6b4e 2b2f 7462 eh0sczcZVxkN+/tb 0x00a0: 4f51 7777 5673 4937 4b74 4133 496f 757a OQwwVsI7KtA3Iouz 0x00b0: 3544 5474 7873 3d0d 0a55 7365 722d 4167 5DTtxs=..User-Ag 0x00c0: 656e 743a 2053 4145 204f 6e6c 696e 6520 ent:.SAE.Online. 0x00d0: 506c 6174 666f 726d 0d0a 436f 6e74 656e Platform..Conten 0x00e0: 742d 5479 7065 3a20 6d75 6c74 6970 6172 t-Type:.multipar 0x00f0: 742f 666f 726d 2d64 6174 613b 2062 6f75 t/form-data;.bou 0x0100: 6e64 6172 793d 3975 554f 3659 7470 3964 ndary=9uUO6Ytp9d 0x0110: 0d0a 4361 6368 652d 436f 6e74 726f 6c3a ..Cache-Control: 0x0120: 206e 6f2d 6361 6368 650d 0a50 7261 676d .no-cache..Pragm 0x0130: 613a 206e 6f2d 6361 6368 650d 0a48 6f73 a:.no-cache..Hos 0x0140: 743a 2066 6574 6368 7572 6c2e 7361 652e t:.fetchurl.sae. 0x0150: 7369 6e61 2e63 6f6d 2e63 6e0d 0a41 6363 sina.com.cn..Acc 0x0160: 6570 743a 2074 6578 742f 6874 6d6c 2c20 ept:.text/html,. 0x0170: 696d 6167 652f 6769 662c 2069 6d61 6765 image/gif,.image 0x0180: 2f6a 7065 672c 202a 3b20 713d 2e32 2c20 /jpeg,.*;.q=.2,. 0x0190: 2a2f 2a3b 2071 3d2e 320d 0a43 6f6e 6e65 */*;.q=.2..Conne 0x01a0: 6374 696f 6e3a 206b 6565 702d 616c 6976 ction:.keep-aliv 0x01b0: 650d 0a43 6f6e 7465 6e74 2d4c 656e 6774 e..Content-Lengt 0x01c0: 683a 2031 3438 0d0a 0d0a h:.148.... //接下来是正常的返回,抓取的地址返回的是302 0x0000: 4500 021f eb35 4000 4006 1aea 0a43 0f10 E....5@.@....C.. 0x0010: 0a43 0f24 0050 9fbb ee54 acb8 d061 41cc .C.$.P...T...aA. 0x0020: 5011 0083 db70 0000 4854 5450 2f31 2e31 P....p..HTTP/1.1 0x0030: 2033 3032 204d 6f76 6564 2054 656d 706f .302.Moved.Tempo //这里返回的responseCode是302 0x0040: 7261 7269 6c79 0d0a 5365 7276 6572 3a20 rarily..Server:. 0x0050: 6e67 696e 782f 312e 342e 310d 0a44 6174 nginx/1.4.1..Dat 0x0060: 653a 2057 6564 2c20 3238 204f 6374 2032 e:.Wed,.28.Oct.2 0x0070: 3031 3520 3032 3a34 383a 3534 2047 4d54 015.02:48:54.GMT 0x0080: 0d0a 436f 6e74 656e 742d 5479 7065 3a20 ..Content-Type:. 0x0090: 7465 7874 2f68 746d 6c0d 0a43 6f6e 7465 text/html..Conte 0x00a0: 6e74 2d4c 656e 6774 683a 2032 3135 0d0a nt-Length:.215.. 0x00b0: 436f 6e6e 6563 7469 6f6e 3a20 636c 6f73 Connection:.clos 0x00c0: 650d 0a4c 6f63 6174 696f 6e3a 2068 7474 e..Location:.htt //这里的location标识了跳转的地址 0x00d0: 703a 2f2f 7777 772e 6261 6964 752e 636f p://www.baidu.co 0x00e0: 6d2f 7365 6172 6368 2f65 7272 6f72 2e68 m/search/error.h 0x00f0: 746d 6c0d 0a58 2d55 412d 436f 6d70 6174 tml..X-UA-Compat 0x0100: 6962 6c65 3a20 4945 3d45 6467 652c 6368 ible:.IE=Edge,ch 0x0110: 726f 6d65 3d31 0d0a 4244 5041 4745 5459 rome=1..BDPAGETY 0x0120: 5045 3a20 330d 0a53 6574 2d43 6f6f 6b69 PE:.3..Set-Cooki 0x0130: 653a 2042 4453 5652 544d 3d30 3b20 7061 e:.BDSVRTM=0;.pa 0x0140: 7468 3d2f 0d0a 0d0a 3c68 746d 6c3e 0d0a th=/....<html>.. 0x0150: 3c68 6561 643e 3c74 6974 6c65 3e33 3032 <head><title>302 0x0160: 2046 6f75 6e64 3c2f 7469 746c 653e 3c2f .Found</title></ 0x0170: 6865 6164 3e0d 0a3c 626f 6479 2062 6763 head>..<body.bgc 0x0180: 6f6c 6f72 3d22 7768 6974 6522 3e0d 0a3c olor="white">..< 0x0190: 6365 6e74 6572 3e3c 6831 3e33 3032 2046 center><h1>302.F 0x01a0: 6f75 6e64 3c2f 6831 3e3c 2f63 656e 7465 ound</h1></cente 0x01b0: 723e 0d0a 3c68 723e 3c63 656e 7465 723e r>..<hr><center> 0x01c0: 7072 2d6e 6769 6e78 5f31 2d30 2d32 3531 pr-nginx_1-0-251 0x01d0: 5f42 5241 4e43 4820 4272 616e 6368 0a54 _BRANCH.Branch.T 0x01e0: 696d 6520 3a20 4d6f 6e20 4f63 7420 3139 ime.:.Mon.Oct.19 0x01f0: 2031 343a 3137 3a35 3320 4353 5420 3230 .14:17:53.CST.20 0x0200: 3135 3c2f 6365 6e74 6572 3e0d 0a3c 2f62 15</center>..</b 0x0210: 6f64 793e 0d0a 3c2f 6874 6d6c 3e0d 0a ody>..</html>.. //好了这个时候应该是结束了对不对,一个完整的http请求完结,可是就在这个时候 又发起了一次请求 0x0000: 4500 0156 f89f 4000 4006 0e49 0a43 0f24 E..V..@.@..I.C.$ 0x0010: 0a43 0f10 9fbf 0050 82cd d7fb 4541 d7e7 .C.....P....EA.. 0x0020: 5018 0073 3402 0000 4745 5420 2f20 4854 P..s4...GET./.HT //这里怎么又发起了一起请求,而且还不是跳转的地址 0x0030: 5450 2f31 2e31 0d0a 4665 7463 6855 726c TP/1.1..FetchUrl //这是什么鬼,这里可能涉及到环境问题,慢慢查先 0x0040: 3a20 6874 7470 3a2f 2f31 302e 3637 2e31 :.http://10.67.1 0x0050: 352e 3131 2f0d 0a41 6363 6573 734b 6579 5.11/..AccessKey 0x0060: 3a20 7978 776f 6e77 7835 3235 0d0a 5469 :.yxwonwx525..Ti 0x0070: 6d65 5374 616d 703a 2031 3434 3630 3030 meStamp:.1446000 0x0080: 3534 320d 0a53 6967 6e61 7475 7265 3a20 542..Signature:. 0x0090: 5667 7764 5756 5866 6956 512f 7531 3731 VgwdWVXfiVQ/u171 0x00a0: 6562 714c 3557 6270 4b70 6c43 3673 5262 ebqL5WbpKplC6sRb 0x00b0: 7652 3736 7a6b 5035 2f71 383d 0d0a 5573 vR76zkP5/q8=..Us 0x00c0: 6572 2d41 6765 6e74 3a20 5341 4520 4f6e er-Agent:.SAE.On 0x00d0: 6c69 6e65 2050 6c61 7466 6f72 6d0d 0a53 line.Platform..S 0x00e0: 6165 486f 7374 3a20 6665 7463 6875 726c aeHost:.fetchurl 0x00f0: 2e73 6165 2e73 696e 612e 636f 6d2e 636e .sae.sina.com.cn 0x0100: 0d0a 5361 6552 656d 6f74 6549 503a 2031 ..SaeRemoteIP:.1 0x0110: 302e 3637 2e31 352e 3936 0d0a 486f 7374 0.67.15.96..Host 0x0120: 3a20 6665 7463 6875 726c 2e73 6165 2e73 :.fetchurl.sae.s 0x0130: 696e 612e 636f 6d2e 636e 0d0a 436f 6e6e ina.com.cn..Conn 0x0140: 6563 7469 6f6e 3a20 4b65 6570 2d41 6c69 ection:.Keep-Ali 0x0150: 7665 0d0a 0d0a ve.... 最后返回的是个错误页面
从以上可以看出,似乎有股神秘的力量发起了重新连接的事件,于是想到是不是HttpURLConnection 的有follow的机制
看了下jdk的源码发现
/** * Sets whether HTTP redirects (requests with response code 3xx) should * be automatically followed by this class. True by default. Applets //当遇到3xx的返回code的时候,会自动的follow并且这 * cannot change this variable. //里的默认值是true 需要静态设置 将他设置成false 也就 * <p> //全局的false * If there is a security manager, this method first calls * the security manager‘s <code>checkSetFactory</code> method * to ensure the operation is allowed. * This could result in a SecurityException. * * @param set a <code>boolean</code> indicating whether or not * to follow HTTP redirects. * @exception SecurityException if a security manager exists and its * <code>checkSetFactory</code> method doesn‘t * allow the operation. * @see SecurityManager#checkSetFactory * @see #getFollowRedirects() */ public static void setFollowRedirects(boolean set) { SecurityManager sec = System.getSecurityManager(); if (sec != null) { // seems to be the best check here... sec.checkSetFactory(); } followRedirects = set; }
于是在这里设置了下,然后302的问题解决了
原文:http://my.oschina.net/u/268957/blog/523249