Regex ignore URL already in HTML tags

Try this

(?<!href=")(\b[\w]+:\/\/[\w-?&;#~=\.\/\@]+[\w\/])

See it here on Regexr

To make it more general you can simplify your lookbehind to check only for “=””

(?<!=")(\b[\w]+:\/\/[\w-?&;#~=\.\/\@]+[\w\/])

See it on Regexr

(?<!href="https://stackoverflow.com/questions/9567836/) is a negative lookbehind assertion, it ensures that there is no”href=”” before your pattern.

\b is a word boundary that anchors the start of your link to a change from a non word to a word character. without this the lookbehind would be useless and it would match from the “ttp://…” on.

Leave a Comment