Weird behaviour initializing a numpy array of string data

Numpy requires string arrays to have a fixed maximum length. When you create an empty array with dtype=str, it sets this maximum length to 1 by default. You can see if you do my_array.dtype; it will show “|S1”, meaning “one-character string”. Subsequent assignments into the array are truncated to fit this structure.

You can pass an explicit datatype with your maximum length by doing, e.g.:

my_array = numpy.empty([1, 2], dtype="S10")

The “S10” will create an array of length-10 strings. You have to decide how big will be big enough to hold all the data you want to hold.

Leave a Comment