How to replace underscore to dash with Nginx

No, there’s not an easy way to do this, but the rewrite engine can nonetheless be coerced into doing it, assuming you can put a reasonable cap on the number of dashes you need to convert in a single url (or even if you don’t, see the end of the answer.)

Here’s how I’d do it (tested code):

rewrite ^([^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)_(.*)$ $1-$2-$3-$4-$5-$6-$7-$8-$9;
rewrite ^([^_]*)_([^_]*)_([^_]*)_([^_]*)_(.*)$ $1-$2-$3-$4-$5;
rewrite ^([^_]*)_([^_]*)_(.*)$ $1-$2-$3;
rewrite ^([^_]*)_(.*)$ $1-$2;

The four rewrites respectively translate the first 8, 4, 2, and 1 underscores in the url to dashes. The number of underscores in each rule are decreasing powers of 2 on purpose. This block is the most efficient set of rules that will translate from 0 up to 15 occurrences of underscore in a single url, using all 16 combinations of either matching or not matching each individual rule.

You will also notice that I used [^_]* on every group except the last one, in every rule. This avoids having the regexp engine perform unneeded backtracking in the case of non matches. Basically, having nine universal stars .* in a regexp causes O(n9) complexity (which is quite bad) in the “worst case”, which is a non match, which would actually be your most frequent case. (I can recommend this book for those who wish to really understand how a regexp is actually executed by the underlying library.)

For this reason, if you can put a smaller limit on the number of dashes than 15, I would recommend taking away the first rule, or the first two. The last three rules alone will translate up to 7 underscores; the last two will translate up to 3.

Finally, you didn’t mention redirecting the user to the new url. (As opposed to just serving the content both at the underscored url and at the correct one, which is usually frowned upon by the search engine nuts. Just FYI.) If that’s what you need, you will have to put those rewrites into a special location that is triggered on the presence of an underscore in the url, and that redirects the user to the new url at the end of the four rewrites:

location ~ _ {
  rewrite ^([^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)_(.*)$ $1-$2-$3-$4-$5-$6-$7-$8-$9;
  rewrite ^([^_]*)_([^_]*)_([^_]*)_([^_]*)_(.*)$ $1-$2-$3-$4-$5;
  rewrite ^([^_]*)_([^_]*)_(.*)$ $1-$2-$3;
  rewrite ^([^_]*)_(.*)$ $1-$2;
  rewrite ^ $uri permanent;
}

This also adds the benefit of traslating an unlimited number of underscores in a single url, at the expense of more that one redirect to the user’s browser.

HTH ;-P

Leave a Comment