首页 > Web开发 > 详细

提取html内的文字1

时间:2014-10-27 15:31:08      阅读:221      评论:0      收藏:0      [点我收藏+]

  public static string StripHTML(string strHtml)   {    string [] aryReg ={           @"<script[^>]*?>.*?</script>",

          @"<(\/\s*)?!?((\w+:)?\w+)(\w+(\s*=?\s*(([""‘])(\\[""‘tbnr]|[^\7])*?\7|\w+)|.{0})|\s)*?(\/\s*)?>",           @"([\r\n])[\s]+",           @"&(quot|#34);",           @"&(amp|#38);",           @"&(lt|#60);",           @"&(gt|#62);",           @"&(nbsp|#160);",           @"&(iexcl|#161);",           @"&(cent|#162);",           @"&(pound|#163);",           @"&(copy|#169);",           @"&#(\d+);",           @"-->",           @"<!--.*\n"         
         };

   string [] aryRep = {            "",            "",            "",            "\"",            "&",            "<",            ">",            " ",            "\xa1",//chr(161),            "\xa2",//chr(162),            "\xa3",//chr(163),            "\xa9",//chr(169),            "",            "\r\n",            ""           };

   string newReg =aryReg[0];    string strOutput=strHtml;    for(int i = 0;i<aryReg.Length;i++)    {     Regex regex = new Regex(aryReg[i],RegexOptions.IgnoreCase );     strOutput = regex.Replace(strOutput,aryRep[i]);    }

   strOutput.Replace("<","");    strOutput.Replace(">","");    strOutput.Replace("\r\n","");

   return strOutput;   }

提取html内的文字1

原文:http://www.cnblogs.com/pengdc/p/4054289.html

(0)
(0)
   
举报
评论 一句话评论(0
关于我们 - 联系我们 - 留言反馈 - 联系我们:wmxa8@hotmail.com
© 2014 bubuko.com 版权所有
打开技术之扣,分享程序人生!