首页 > Web开发 > 详细

IOS 用正则表达式解析HTML等文件,得到所有文本

时间:2014-01-14 21:25:02      阅读:784      评论:0      收藏:0      [点我收藏+]

 获得网页内容

NSURL *url=[NSURL URLWithString:@"http://121.199.34.52/wordpress/?json=core.get_post_content&post_id=8764&post_type=post"];
     NSDictionary * dic=[NSJSONSerialization JSONObjectWithData:[NSData dataWithContentsOfURL:url] options:0 error:Nil];
 
  NSString *content=[dic objectForKey:@"content"];

正则表达式

   NSRegularExpression *regularExpretion=[NSRegularExpression regularExpressionWithPattern:@"<[^>]*>|\n"
                                                                                    options:0
                                                                                      error:nil];
    
    content=[regularExpretion stringByReplacingMatchesInString:content options:NSMatchingReportProgress range:NSMakeRange(0, content.length) withTemplate:@"-"];//替换所有html和换行匹配元素为"-"
    
    regularExpretion=[NSRegularExpression regularExpressionWithPattern:@"-{1,}" options:0 error:nil] ;
     content=[regularExpretion stringByReplacingMatchesInString:content options:NSMatchingReportProgress range:NSMakeRange(0, content.length) withTemplate:@"-"];//把多个"-"匹配为一个"-"
    
    //根据"-"分割到数组
     NSArray *arr=[NSArray array];
    content=[NSString stringWithString:content];
     arr =  [content componentsSeparatedByString:@"-"];
    NSMutableArray *marr=[NSMutableArray arrayWithArray:arr];
    [marr removeObject:@""];
    for (NSString *str in marr) {
           NSLog(@"呵呵-------------%@",str);
        
    }

 

去除字符串中所有得空格及控制字符:

str = [str stringByTrimmingCharactersInSet:[NSCharacterSet whitespaceAndNewlineCharacterSet ]];

IOS 用正则表达式解析HTML等文件,得到所有文本

原文:http://www.cnblogs.com/huntaiji/p/3513172.html

(0)
(0)
   
举报
评论 一句话评论(0
关于我们 - 联系我们 - 留言反馈 - 联系我们:wmxa8@hotmail.com
© 2014 bubuko.com 版权所有
打开技术之扣,分享程序人生!