Youtube I.D parsing for new URL formats

I had to deal with this for a PHP class I wrote a few weeks ago and ended up with a regex that matches any kind of strings: With or without URL scheme, with or without subdomain, youtube.com URL strings, youtu.be URL strings and dealing with all kind of parameter sorting. You can check it out at GitHub or simply copy and paste the code block below:

/**
 *  Check if input string is a valid YouTube URL
 *  and try to extract the YouTube Video ID from it.
 *  @author  Stephan Schmitz <[email protected]>
 *  @param   $url   string   The string that shall be checked.
 *  @return  mixed           Returns YouTube Video ID, or (boolean) false.
 */
function parse_yturl($url)
{
    $pattern = '#^(?:https?://|//)?(?:www\.|m\.)?(?:youtu\.be/|youtube\.com/(?:embed/|v/|watch\?v=|watch\?.+&v=))([\w-]{11})(?![\w-])#';
    preg_match($pattern, $url, $matches);
    return (isset($matches[1])) ? $matches[1] : false;
}

Test cases: https://3v4l.org/GEDT0
JavaScript version: https://stackoverflow.com/a/10315969/624466

To explain the regex, here’s a split up version:

/**
 *  Check if input string is a valid YouTube URL
 *  and try to extract the YouTube Video ID from it.
 *  @author  Stephan Schmitz <[email protected]>
 *  @param   $url   string   The string that shall be checked.
 *  @return  mixed           Returns YouTube Video ID, or (boolean) false.
 */
function parse_yturl($url)
{
    $pattern = '#^(?:https?://|//)?' # Optional URL scheme. Either http, or https, or protocol-relative.
             . '(?:www\.|m\.)?'      #  Optional www or m subdomain.
             . '(?:'                 #  Group host alternatives:
             .   'youtu\.be/'        #    Either youtu.be,
             .   '|youtube\.com/'    #    or youtube.com
             .     '(?:'             #    Group path alternatives:
             .       'embed/'        #      Either /embed/,
             .       '|v/'           #      or /v/,
             .       '|watch\?v='    #      or /watch?v=,
             .       '|watch\?.+&v=' #      or /watch?other_param&v=
             .     ')'               #    End path alternatives.
             . ')'                   #  End host alternatives.
             . '([\w-]{11})'         # 11 characters (Length of Youtube video ids).
             . '(?![\w-])#';         # Rejects if overlong id.
    preg_match($pattern, $url, $matches);
    return (isset($matches[1])) ? $matches[1] : false;
}

Leave a Comment