Stymied by PHP preg_replace


Stymied by PHP preg_replace

以下preg_replace出现问题:

$subject = '<div class="main"> <div class="block_bc"> <a href="index.php?x_param=11" class="BC-1"> Gallery</a> / <a href="path/Title_Item/?x_param=17" class="BC-2"> Title Item</a> / <span class="BC-3"> Bridge</span> </div> </div>';
$regex = '/(<div'sclass='"block_bc'"[^>]*>)([^<'/div>]*>)(<'/div>)/is';   
$replacement = '<div class="block_bc"></div>';
preg_replace($regex, $replacement, $subject);

基本上,我想最终得到<div class="main"> <div class="block_bc"></div> </div>,但它没有被选中。

有人能告诉我"明显"的错误吗?

您试图错误地使用字符类([])。[^<'/div>]*部分表示除以下字符之一之外的字符数:</div>。这可能不是你的意思。

你可以使用的是非贪婪重复:

$regex = '/(<div's*class='"block_bc'"[^>]*>)(.+?)(<'/div>)/is';

此外,使用regexp从html中获取内容可能非常脆弱,请尝试使用xpath中的DOM。它更详细,但对格式错误的输入也更有弹性:

$subject = '<div class="main"> <div class="block_bc"> <a href="index.php?x_param=11" class="BC-1"> Gallery</a> / <a href="path/Title_Item/?x_param=17" class="BC-2"> Title Item</a> / <span class="BC-3"> Bridge</span> </div> </div>';
libxml_use_internal_errors(true); // supress warnings
$doc = new DOMDocument;
$doc->loadHTML($subject);
$xpath = new DOMXpath($doc);
// get the <div class="main"> node for exporting
$main_node  = $xpath->query('//div[@class="main"]');
// select the block_bc classed div's childs, and the textnodes under it
$childNodes = $xpath->query('//div[@class="block_bc"]/* | //div[@class="block_bc"]/text()'); 
foreach ($childNodes as $c) {
    $c->parentNode->removeChild($c); // clear them all
}
// export the part of the document under the <div class="main">
print $doc->saveHTML($main_node->item(0)); 
// update:
// if you want the full document in html you can simply omit the parameter, with this you can get rid of the $main_node = ... line too
print $doc->saveHTML(); // this will print from doctype to </html>

图案

~<div''slass="block_bc";[^>]*><div>isU

将一直工作到您在"block_bc"中添加一些div为止。

[^</div>]只创建一个没有'<','/'的字符类d'、'i'、'v'和'>'。它并没有做你认为会做的事。用非贪婪的任何匹配替换中间部分都可以:

'/(<div'sclass='"block_bc'"[^>]*>)(.*?)(<'/div>)/is'