我不完全确定我问的是对的,但它开始了。我有一个html文件,结构如下:
<div class="tbody">
<div class="row">
<div class="col th">
<a class="channel_sched_link" href="javascript:void(0)" title="Channel A schedule" data-channelid="9">
<img src="http://xxxxx/images/tv/A.JPG" width="30" height="20" alt="Channel A" />Channel A </a>
</div>
<div class="prog_cols">
<div class="col ts ts_1 prog_802176 ps_0" data-catid="" >
<span class="prog_name">First Program</span>
<div class="prog_time">February 24, 2015, 4:00 pm - 6:00 pm</div>
<a class="btn_watchlist " href="javascript:void(0)" data-progid="802176"> (+) add to watchlist</a>
<div class="prog_desc">
This is the first program for channel A.<br/>
<a class="watchnow" href="http://xxxx/channels/?q=Channel A">Watch Now</a>
</div>
</div>
<div class="col ts ts_3 prog_802177 ps_1" data-catid="" >
<span class="prog_name">Second Program</span>
<div class="prog_time">February 24, 2015, 6:00 pm - 8:00 pm</div>
<a class="btn_watchlist " href="javascript:void(0)" data-progid="802177">(+) add to watchlist</a>
<div class="prog_desc">
This is the second program for channel A.<br/>
<a class="watchnow" href="http://www.xxxxx/channels/?q=Channel A">Watch Now</a>
</div>
</div>
</div>
<a class="watchnow" href="http://xxxx/channels/?q=Channel A">Watch Now</a>
</div>
<div class="row">
<div class="col th">
<a class="channel_sched_link" href="javascript:void(0)" title="Channel B schedule" data-channelid="1">
<img src="http://xxxx/images/tv/B.gif" width="30" height="20" alt="Channel B" />Channel B </a>
</div>
<div class="prog_cols">
<div class="col ts ts_1 prog_802210 news ps_0" data-catid="news" >
<span class="prog_name">First Program</span>
<div class="prog_time">February 24, 2015, 5:00 pm - 6:00 pm</div>
<a class="btn_watchlist " href="javascript:void(0)" data-progid="802210">(+) add to watchlist</a>
<div class="prog_desc">
First Program Channel B.<br/>
<a class="watchnow" href="http://xxxxxx/channels/?q=Channel B">Watch Now</a>
</div>
</div>
我可以使用解析每个通道的prog_name,但只能解析prog_name的第一个实例
$programname = $xpath->query('//span[@class="prog_name"]');
一旦我得到这个,我就把它和其他信息一起保存到一个xml文件中。如何解析每个频道的每个prog_name。我知道这可能与循环有关,但我不知所措。并不是每个具有的频道都有相同数量的prog_name。
这适用于您的html:
$dom = new DOMDocument();
$dom->loadHTML($html);
$xpath = new DOMXpath($dom);
$childs = $xpath->query('//span[@class="prog_name"]');
foreach ($childs as $child)
{
var_dump($child->nodeValue);
}
它返回:
string(13) "First Program"
string(14) "Second Program"
string(13) "First Program"