首页 > 其他 > 详细

shell抓取

时间:2014-04-15 23:03:08      阅读:675      评论:0      收藏:0      [点我收藏+]
bubuko.com,布布扣
#!/bin/sh

dir=`dirname $0`
configDir="$dir/config"

ipport="$configDir/ip_port"

url="http://www.youdaili.cn/Daili/http/"
indexs=$(curl -s --max-time 200 "$url" |piconv -f utf8 -t gbk|awk $0~/http:\/\/www.youdaili.cn\/static\/images\/hot.gif/{print substr($2,41,length($2)-46)})

pages="$(curl -s --max-time 200  "${url}${indexs}.html"|piconv -f utf8 -t gbk|awk ‘$0~/共.*页/{page=gensub(/.*共([^页]+).*/,"\\1","1",$0);print page}‘)"

for((page=1;page<=$pages;page++))
do
        if [[ $page -eq 1  ]]
        then
                curl -s --max-time 200  "${url}${indexs}.html"|piconv -f utf8 -t gbk|awk $0~/.*@HTTP#.*<br \/>/{gsub(".*<p>","",$0);gsub(".*<span>","",$0);gsub("@HTTP#.*","",$0);print}
        else
                link="${url}${indexs}_$page.html"
                curl -s --max-time 200  "$link"|piconv -f utf8 -t gbk|awk $0~/.*@HTTP#.*<br \/>/{gsub(".*<p>","",$0);gsub(".*<span>","",$0);gsub("@HTTP#.*","",$0);print}
        fi
done | sort -u >$ipport
bubuko.com,布布扣

 

shell抓取,布布扣,bubuko.com

shell抓取

原文:http://www.cnblogs.com/code-style/p/3664964.html

(0)
(0)
   
举报
评论 一句话评论(0
关于我们 - 联系我们 - 留言反馈 - 联系我们:wmxa8@hotmail.com
© 2014 bubuko.com 版权所有
打开技术之扣,分享程序人生!