Remove all attributes from html tags

Adapted from my answer on a similar question

$text="<p style="padding:0px;"><strong style="padding:0;margin:0;">hello</strong></p>";

echo preg_replace("/<([a-z][a-z0-9]*)[^>]*?(\/?)>/si",'<$1$2>', $text);

// <p><strong>hello</strong></p>

The RegExp broken down:

/              # Start Pattern
 <             # Match '<' at beginning of tags
 (             # Start Capture Group $1 - Tag Name
  [a-z]        # Match 'a' through 'z'
  [a-z0-9]*    # Match 'a' through 'z' or '0' through '9' zero or more times
 )             # End Capture Group
 [^>]*?        # Match anything other than '>', Zero or More times, not-greedy (wont eat the /)
 (\/?)         # Capture Group $2 - "https://stackoverflow.com/" if it is there
 >             # Match '>'
/is            # End Pattern - Case Insensitive & Multi-line ability

Add some quoting, and use the replacement text <$1$2> it should strip any text after the tagname until the end of tag /> or just >.

Please Note This isn’t necessarily going to work on ALL input, as the Anti-HTML + RegExp will tell you. There are a few fallbacks, most notably <p style=">"> would end up <p>"> and a few other broken issues… I would recommend looking at Zend_Filter_StripTags as a more full proof tags/attributes filter in PHP

Leave a Comment