网页内容的html标签补全和过滤的两种方法

时间：2014-07-10 19:52:20 阅读：438 评论：0 收藏：0 [点我收藏+]

网页内容的html标签补全和过滤的两种方法:

如果你的网页内容的html标签显示不全,有些表格标签不完整而导致页面混乱,或者把你的内容之外的局部html页面给包含进去了,我们可以写个函数方法来补全html标签以及过滤掉无用的html标签.

php使HTML标签自动补全,闭合,过滤函数方法一:

代码:

function closetags($html) {
  preg_match_all('#<(?!meta|img|br|hr|input\b)\b([a-z]+)(?: .*)?(?<![/|/ ])>#iU', $html, $result);
  $openedtags = $result[1];
  preg_match_all('#</([a-z]+)>#iU', $html, $result);
  $closedtags = $result[1];
  $len_opened = count($openedtags);
  if (count($closedtags) == $len_opened) {
       return $html;
  }
  $openedtags = array_reverse($openedtags);
  for ($i=0; $i < $len_opened; $i++) {
       if (!in_array($openedtags[$i], $closedtags)) {
         $html .= '</'.$openedtags[$i].'>';
       }else {
         unset($closedtags[array_search($openedtags[$i], $closedtags)]);
       }
  }
  return $html;
}

closetags()解析:
array_reverse() : 此函数将原数组中的元素顺序翻转，创建新的数组并返回。如果第二个参数指定为 true，则元素的键名保持不变，否则键名将丢失。
array_search() : array_search(value,array,strict),此函数与in_array()一样在数组中查找一个键值。如果找到了该值，匹配元素的键名会被返回。如果没找到，则返回 false。如果第三个参数strict被指定为 true，则只有在数据类型和值都一致时才返回相应元素的键名。

php使HTML标签自动补全,闭合,过滤函数方法二:

function checkhtml($html) {
	$html = stripslashes($html);
		preg_match_all("/\<([^\<]+)\>/is", $html, $ms);
		$searchs[] = '<';
		$replaces[] = '<';
		$searchs[] = '>';
		$replaces[] = '>';
		
		if($ms[1]) {
			$allowtags = 'img|font|div|table|tbody|tr|td|th|br|p|b|strong|i|u|em|span|ol|ul|li';//允许的标签
			$ms[1] = array_unique($ms[1]);
			foreach ($ms[1] as $value) {
				$searchs[] = "<".$value.">";
				$value = shtmlspecialchars($value);
				$value = str_replace(array('\\','/*'), array('.','/.'), $value);
				$value = preg_replace(array("/(javascript|script|eval|behaviour|expression)/i", "/(\s+|"|')on/i"), array('.', ' .'), $value);
				if(!preg_match("/^[\/|\s]?($allowtags)(\s+|$)/is", $value)) {
					$value = '';
				}
				$replaces[] = empty($value)?'':"<".str_replace('"', '"', $value).">";
			}
		}
		$html = str_replace($searchs, $replaces, $html);
	
	return $html;
}
//取消HTML代码
function shtmlspecialchars($string) {
	if(is_array($string)) {
		foreach($string as $key => $val) {
			$string[$key] = shtmlspecialchars($val);
		}
	} else {
		$string = preg_replace('/&((#(\d{3,5}|x[a-fA-F0-9]{4})|[a-zA-Z][a-z0-9]{2,5});)/', '&\\1',
			str_replace(array('&', '"', '<', '>'), array('&', '"', '<', '>'), $string));
	}
	return $string;
}

checkhtml($html)解析:

stripslashes():函数删除由addslashes()函数添加的反斜杠。该函数用于清理从数据库或HTML表单中取回的数据。

谢谢关注websites博客!

网页内容的html标签补全和过滤的两种方法,布布扣,bubuko.com

网页内容的html标签补全和过滤的两种方法

原文：http://blog.csdn.net/websites/article/details/37594395

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年09月23日 (328)
2021年09月24日 (313)
2021年09月17日 (191)
2021年09月15日 (369)
2021年09月16日 (411)
2021年09月13日 (439)
2021年09月11日 (398)
2021年09月12日 (393)
2021年09月10日 (160)
2021年09月08日 (222)