Detect and extract url from a string?

Let me go ahead and preface this by saying that I’m not a huge advocate of regex for complex cases. Trying to write the perfect expression for something like this is very difficult. That said, I do happen to have one for detecting URL’s and it’s backed by a 350 line unit test case class that passes. Someone started with a simple regex and over the years we’ve grown the expression and test cases to handle the issues we’ve found. It’s definitely not trivial:

// Pattern for recognizing a URL, based off RFC 3986
private static final Pattern urlPattern = Pattern.compile(
        "(?:^|[\\W])((ht|f)tp(s?):\\/\\/|www\\.)"
                + "(([\\w\\-]+\\.){1,}?([\\w\\-.~]+\\/?)*"
                + "[\\p{Alnum}.,%_=?&#\\-+()\\[\\]\\*$~@!:/{};']*)",
        Pattern.CASE_INSENSITIVE | Pattern.MULTILINE | Pattern.DOTALL);

Here’s an example of using it:

Matcher matcher = urlPattern.matcher("foo bar http://example.com baz");
while (matcher.find()) {
    int matchStart = matcher.start(1);
    int matchEnd = matcher.end();
    // now you have the offsets of a URL match
}

Leave a Comment