php条形标记，'<>'；(书名） - php strip tags except '<>' (book name)

如何删除PHP中除<>字符外的所有HTML标记？

//There's other HTML tags, like h1, div, etc.
echo strip_tags('<gone with the wind> <p>a hotest book</p>');

这将返回a hotest book，但我需要保留书名。我需要返回<gone with the wind> a hotest book的函数。

您应该考虑使用<(<(和&rt;(>(。

下面将使用DOM查找任何不是有效HTML4元素的元素，并将其视为图书标题。这些将在strip_tags中被列入白名单。

libxml_use_internal_errors(true);
$dom = new DOMDocument;
$dom->loadHTML($html);
echo strip_tags($html, implode(',', 
    array_map(
        function($error) {
            return '<' . sscanf($error->message, 'Tag %s invalid')[0] . '>';
        },
        libxml_get_errors()
    )
));

在线演示

请注意，任何以有效HTML标记开头的书名都将被视为有效HTML，因此会被剥离(例如"证据体"或"头优先PHP"(。还要注意，<gone with the wind>被认为是具有"with"、"the"answers"wind"属性的"gone"元素。对于有效的元素，您可以检查它们是否只有空属性，如果没有，则将其剥离，但当标题仅由有效的元素名称组成时，这仍然不是100%准确的。此外，您可以检查关闭标记，但我不知道如何使用DOM进行检查(不过XMLParser可以检测到它们(。

无论如何，为这些书名找到一个更好的格式，例如使用名称空间或使用与尖括号不同的分隔符，都会大大提高你正确使用的机会。

这里有一个简单但并非万无一失的解决方案。

PHP

$data = "<gone with the wind> <p>a hotest book</p>";
$out = preg_replace("/'<'w+'>|'<'/'w+'>/im", "", $data);
var_dump($out);

输出

string '<gone with the wind> a hotest book' (length=34)

将匹配

<p>text</p>
<anything>text</anything>

不匹配

正如前面所说，代码无法知道书名是什么样子的。

<img src="url">

不过，如果您希望您的数据是简单的<p>标记，那么这将起作用。

疯狂的解决方案，我想我会把它扔出去。

您也可以更轻松地完成。

   <?php
   $string = htmlspecialchars("<gone with the wind>");
   echo strip_tags( "$string <p>a hotest book</p>");
   ?>

这将推出：

   <gone with the wind> a hotest book

演示此处

$string = '<gone with the wind> <p>a hotest book</p>';
$string = strip_tags(preg_replace("/<(['w's'd]{6,})>/", "&lt;$1&gt;", $string));
$string = html_entity_decode($string);

以上操作将转换<>到<>之间超过六个字母的任何"标记"，从而允许您使用strip_tags。

根据传入的数据，您可能需要对六个值进行实验。如果你得到一个像<article>这样的标签，你可能需要把它推得更高。

我能想到的最好的事情就是这样做，因为我不知道会使用什么类型的标签，我只是假设所有的标签，这应该会删除任何有效的html标签，而不仅仅是看起来可能是标签的标签。

<?php
$tags = array("!DOCTYPE","a","abbr","acronym","address","applet","area","article","aside","audio","b","base","basefont","bdi","bdo","big","blockquote","body","br","button","canvas","caption","center","cite","code","col","colgroup","command","datalist","dd","del","details","dfn","dir","div","dl","dt","em","embed","fieldset","figcaption","figure","font","footer","form","frame","frameset","h1","h2","h3","h4","h5","h6","head","header","hgroup","hr","html","i","iframe","img","input","ins","kbd","keygen","label","legend","li","link","map","mark","menu","meta","meter","nav","noframes","noscript","object","ol","optgroup","option","output","p","param","pre","progress","q","rp","rt","ruby","s","samp","script","section","select","small","source","span","strike","strong","style","sub","summary","sup","table","tbody","td","textarea","tfoot","th","thead","time","title","tr","track","tt","u","ul","var","video","wbr");
$string = "<gone with the wind> <p>a hotest book</p>";

echo preg_replace("/<('/|)(".implode("|", $tags).").*>/iU", "", $string);

最终输出如下：

<gone with the wind> a hotest book

你会在这方面运气不佳，因为你无法知道<>中哪些是HTML标记，哪些是书名。你甚至不能写一些东西来寻找看起来像标签但实际上不是有效HTML标签的东西，因为你可能会得到Monkees 1968年电影《Head》的记录，它会被理解为<Head>，这当然是一个有效的HTML标签。

您需要与数据供应商一起解决这个问题，然后才能使用PHP strip_tags函数。