One possibility:
String imgRegex = "<img[^>]+src\\s*=\\s*['\"]([^'\"]+)['\"][^>]*>";
is a possibility (if matched case-insensitively). It’s a bit of a mess, and deliberately ignores the case where quotes aren’t used. To represent it without worrying about string escapes:
<img[^>]+src\s*=\s*['"]([^'"]+)['"][^>]*>
This matches:
<img
- one or more characters that aren’t
>
(i.e. possible other attributes) src
- optional whitespace
=
- optional whitespace
- starting delimiter of
'
or"
- image source (which may not include a single or double quote)
- ending delimiter
- although the expression can stop here, I then added:
- zero or more characters that are not
>
(more possible attributes) >
to close the tag
- zero or more characters that are not
Things to note:
- If you want to include the
src=
as well, move the open bracket further left 🙂 - This does not care about delimiter balancing or attribute values without delimiters, and it can also choke on badly-formed attributes (such as attributes that include
>
or image sources that include'
or"
). - Parsing HTML with regular expressions like this is non-trivial, and at best a quick hack that works in the majority of cases.