使用php从网页下载图像 - downloading images from webpages using php

downloading images from webpages using php

本文关键字：下载图像网页 php 使用 | 更新日期: 2023-09-27

我正在尝试创建一个PHP函数，该函数从您作为参数放入的网页中下载图像。然而，网页本身是一种画廊，只有非常小的图像缩略图版本，每个版本都直接链接到我想下载到本地计算机的较大的完整jpeg图像。因此，图像不会直接从我放入该功能的网页本身下载，而是从网页上这些jpeg图像文件的单独链接下载。

例如：

www.somesite.com/galleryfullofimages/

是图像库的位置，

然后，我想要的库中的每个jpeg图像文件都位于类似于的位置

www.somesite.com/galleryfullofimages/images/01.jpg
www.somesite.com/galleryfullofimages/images/02.jpg
www.somesite.com/galleryfullofimages/images/03.jpg

到目前为止，我一直在尝试使用file_get_contents函数将网页的完整html作为字符串，然后尝试隔离引号中的所有<a href="images/01.jpg">元素，并将它们放入数组中。然后使用这个数组来定位每个图像，并通过循环下载所有图像。

这就是我迄今为止所做的：

<?php
$link = "http://www.somesite.com/galleryfullofimages/";
$contents = file_get_contents($link);
$results = preg_split('/<a href="[^"]*"/', $contents);
?>

但我被困在了这一点上。我对正则表达式也完全陌生，正如你所看到的，我尝试过使用它。如何隔离每个图像链接，然后下载图像？或者有更好的方法来做到这一点吗？我也读过关于使用cURL的文章。但我似乎也无法实现这一点。

我希望这一切都有意义。任何帮助都将不胜感激。

这通常被称为"抓取"网站。您已经在检索页面的标记，因此您有了一个良好的开端。

以下是您下一步需要做的：

<?php
// Load the retrieved markup into a DOM object using PHP's
// DOMDocument::loadHTML method.
    $docObj = new DOMDocument();
    $docObj->loadHTML($contents);
// Create a XPath object.
    $xpathObj = new DOMXpath($docObj);
// Query for all a tags. You can get very creative here, depending on your
// understanding of XPath. For example, you could change the query to just
// return the href attribute directly. This code returns all anchor tags in
// the page, if the href attribute ends in ".jpg".
    $elements = $xpathObj->query('//a[ends-with(@href,".jpg")]');
// Process the discovered image URL's. You could use cURL for this,
// or file_get_contents again (since your host has allow_url_fopen enabled)
// to fetch the image directly and then store it locally.
    foreach ($elements as $domNode)
    {
        $url = $domNode->getAttribute('href');
    }
?>

DOMDocument:：loadHTML
XPath
XPath:：查询
allow_url_fopen