Convert backslash-delimited string into an associative array

Using a simple regex via preg_match_all and array_combine is often the shortest and quickest option:

 preg_match_all("/([^\\\\]+)\\\\([^\\\\]+)/", $string, $p);
 $array = array_combine($p[1], $p[2]);

Now this is of course a special case. Both keys and values are separated by a \ backslash, as are all pairs of them. The regex is also a bit lengthier due to the necessary double escaping.

However this scheme can be generalized to other key:value,-style strings.

Distinct key:value, separators

Common variations include : and = as key/value separators, and , or & and others as pair delimiters. The regex becomes rather obvious in such cases (with the /x flag for readability):

 #                    ↓    ↓    ↓
 preg_match_all("/ ([^:]+) : ([^,]+) /x", $string, $p);
 $array = array_combine($p[1], $p[2]);

Which makes it super easy to exchange : and , for other delimiters.

  • Equal signs = instead of : colons.
  • For example \\t as pair delimiter (tab-separated key:value lists)
  • Classic & or ; as separator between key=value pairs.
  • Or just \\s spaces or \\n newlines even.

Allow varying delimiters

You can make it more flexible/forgiving by allowing different delimiters between keys/values/pairs:

 #                    ↓      ↓       ↓
 preg_match_all("/ ([^:=]+) [:=]+ ([^,+&]+) /x", $string, $p);

Where both key=value,key2:value2++key3==value3 would work. Which can make sense for more human-friendlinies (AKA non-technical users).

Constrain alphanumeric keys

Oftentimes you may want to prohibit anything but classic key identifiers. Just use a \w+ word string pattern to make the regex skip over unwanted occurences:

 #                   ↓   ↓    ↓
 preg_match_all("/ (\w+) = ([^,]+) /x", $string, $p);

This is the most trivial whitelisting approach. If OTOH you want to assert/constrain the whole key/value string beforehand, then craft a separate preg_match("/^(\w+=[^,]+(,|$))+/", …

Strip spaces or quoting

You can skip a few post-processing steps (such as trim on keys and values) with a small addition:

 preg_match_all("/ \s*([^=]+) \s*=\s* ([^,]+) (?<!\s) /x", $string, $p);

Or for instance optional quotes:

 preg_match_all("/ \s*([^=]+) \s*=\s* '? ([^,]+) (?<![\s']) /x", $string, $p);

INI-style extraction

And you can craft a baseline INI-file extraction method:

 preg_match_all("/^ \s*(\w+) \s*=\s* ['\"]?(.+?)['\"]? \s* $/xm", $string, $p);

Please note that this is just a crude subset of common INI schemes.

Alternative: parse_str()

If you have a key=value&key2=value2 string already, then parse_str works like a charm. But by combining it with strtr can even process varying other delimiters:

 #                         ↓↓    ↑↑
 parse_str(strtr($string, ":,", "=&"), $pairs);

Which has a couple of pros and cons of its own:

  • Even shorter than the two-line regex approach.
  • Predefines a well-known escaping mechanism, such as %2F for special characters).
  • Does not permit varying delimiters, or unescaped delimiters within.
  • Automatically converts keys[]= to arrays, which you may or may not want though.

Alternative: explode + foreach

You’ll find many examples of manual key/value string expansion. Though this is often more code. explode is somewhat overused in PHP due to optimization assumptions. After profiling often turns out to be slower however due to the manual foreach and array collection.

Leave a Comment