The real trouble is nested tags. Nested tags are very difficult to handle with regular expressions. It’s possible with balanced matching, but that’s only available in .NET and maybe a couple other flavors. But even with the power of balanced matching, an ill-placed comment could potentially throw off the regular expression.
For example, this is a tricky one to parse…
<div>
<div id="parse-this">
<!-- oops</div> -->
try to get this value with regex
</div>
</div>
You could be chasing edge cases like this for hours with a regular expression, and maybe find a solution. But really, there’s no point when there are specialized XML, XHTML, and HTML parsers out there that do the job more reliably and efficiently.