Using Xpath with PHP to parse HTML

My suggestion is to always use DOMDocument as opposed to SimpleXML, since it’s a much nicer interface to work with and makes tasks a lot more intuitive.

The following example shows you how to load the HTML into the DOMDocument object and query the DOM using XPath. All you really need to do is find all td elements with a class name of topicViews and this will output each of the nodeValue members found in the DOMNodeList returned by this XPath query.

/* Use internal libxml errors -- turn on in production, off for debugging */
libxml_use_internal_errors(true);
/* Createa a new DomDocument object */
$dom = new DomDocument;
/* Load the HTML */
$dom->loadHTMLFile("https://forums.eveonline.com");
/* Create a new XPath object */
$xpath = new DomXPath($dom);
/* Query all <td> nodes containing specified class name */
$nodes = $xpath->query("//td[@class="topicViews"]");
/* Set HTTP response header to plain text for debugging output */
header("Content-type: text/plain");
/* Traverse the DOMNodeList object to output each DomNode's nodeValue */
foreach ($nodes as $i => $node) {
    echo "Node($i): ", $node->nodeValue, "\n";
}

Leave a Comment