You can use the npm modules jsdom and htmlparser to create and parse a DOM in Node.JS.
Other options include:
- BeautifulSoup for python
- you can convert you html to xhtml and use XSLT
- HTMLAgilityPack for .NET
- CsQuery for .NET (my new favorite)
- The spidermonkey and rhino JS engines have native E4X support. This may be useful, only if you convert your html to xhtml.
Out of all these options, I prefer using the Node.js option, because it uses the standard W3C DOM accessor methods and I can reuse code on both the client and server. I wish BeautifulSoup’s methods were more similar to the W3C dom, and I think converting your HTML to XHTML to write XSLT is just plain sadistic.