Best HashTag Regex

If you are pulling statuses containing hashtags from Twitter, you no longer need to find them yourself. You can now specify the include_entities parameter to have Twitter automatically call out mentions, links, and hashtags.

For example, take the following call to statuses/show:

http://api.twitter.com/1/statuses/show/60183527282577408.json?include_entities=true

In the resultant JSON, notice the entities object.

"entities":{"urls":[{"expanded_url":null,"indices":[68,88],"url":"http:\/\/bit.ly\/gWZmaJ"}],"user_mentions":[],"hashtags":[{"text":"wordpress","indices":[89,99]}]}

You can use the above to locate the specific entities in the tweet (which occur between the string positions denoted by the indices property) and transform them appropriately.

If you just need the regular expression to locate the hashtags, Twitter provides these in an open source library.

Hashtag Match Pattern

(^|[^&\p{L}\p{M}\p{Nd}_\u200c\u200d\ua67e\u05be\u05f3\u05f4\u309b\u309c\u30a0\u30fb\u3003\u0f0b\u0f0c\u00b7])(#|\uFF03)(?!\uFE0F|\u20E3)([\p{L}\p{M}\p{Nd}_\u200c\u200d\ua67e\u05be\u05f3\u05f4\u309b\u309c\u30a0\u30fb\u3003\u0f0b\u0f0c\u00b7]*[\p{L}\p{M}][\p{L}\p{M}\p{Nd}_\u200c\u200d\ua67e\u05be\u05f3\u05f4\u309b\u309c\u30a0\u30fb\u3003\u0f0b\u0f0c\u00b7]*)

The above pattern can be pieced together from this java file (retrieved 2015-11-23). Validation tests for this pattern are located in this file around line 128.

Leave a Comment