Regular expression for extracting tag attributes

Update 2021: Radon8472 proposes in the comments the regex https://regex101.com/r/tOF6eA/1 (note regex101.com did not exist when I wrote originally this answer)

<a[^>]*?href=(["\'])?((?:.(?!\1|>))*.?)\1?

Update 2021 bis: Dave proposes in the comments, to take into account an attribute value containing an equal sign, like <img src="test.png?test=val" />, as in this regex101:

(\w+)=["']?((?:.(?!["']?\s+(?:\S+)=|\s*\/?[>"']))+.)["']?

Update (2020), Gyum Fox proposes https://regex101.com/r/U9Yqqg/2 (again, note regex101.com did not exist when I wrote originally this answer)

(\S+)=["']?((?:.(?!["']?\s+(?:\S+)=|\s*\/?[>"']))+.)["']?

Applied to:

<a href=test.html class=xyz>
<a href="test.html" class="xyz">
<a href="https://stackoverflow.com/questions/317053/test.html" class="xyz">
<script type="text/javascript" defer async id="something" onload="alert('hello');"></script>
<img src="test.png">
<img src="a test.png">
<img src=test.png />
<img src=a test.png />
<img src=test.png >
<img src=a test.png >
<img src=test.png alt=crap >
<img src=a test.png alt=crap >

Original answer (2008):
If you have an element like

<name attribute=value attribute="value" attribute="value">

this regex could be used to find successively each attribute name and value

(\S+)=["']?((?:.(?!["']?\s+(?:\S+)=|[>"']))+.)["']?

Applied on:

<a href=test.html class=xyz>
<a href="test.html" class="xyz">
<a href="https://stackoverflow.com/questions/317053/test.html" class="xyz">

it would yield:

'href' => "https://stackoverflow.com/questions/317053/test.html"
'class' => 'xyz'

Note: This does not work with numeric attribute values e.g. <div id="1"> won’t work.

Edited: Improved regex for getting attributes with no value and values with ” ‘ ” inside.

([^\r\n\t\f\v= '"]+)(?:=(["'])?((?:.(?!\2?\s+(?:\S+)=|\2))+.)\2?)?

Applied on:

<script type="text/javascript" defer async id="something" onload="alert('hello');"></script>

it would yield:

'type' => 'text/javascript'
'defer' => ''
'async' => ''
'id' => 'something'
'onload' => 'alert(\'hello\');'

Leave a Comment