C++ string::find complexity

Why the c++’s implemented string::substr() doesn’t use the KMP algorithm (and doesn’t run in O(N + M)) and runs in O(N * M)?

I assume you mean find(), rather than substr() which doesn’t need to search and should run in linear time (and only because it has to copy the result into a new string).

The C++ standard doesn’t specify implementation details, and only specifies complexity requirements in some cases. The only complexity requirements on std::string operations are that size(), max_size(), operator[], swap(), c_str() and data() are all constant time. The complexity of anything else depends on the choices made by whoever implemented the library you’re using.

The most likely reason for choosing a simple search over something like KMP is to avoid needing extra storage. Unless the string to be found is very long, and the string to search contains a lot of partial matches, the time taken to allocate and free that would likely be much more than the cost of the extra complexity.

Is that corrected in c++0x?

No, C++11 doesn’t add any complexity requirements to std::string, and certainly doesn’t add any mandatory implementation details.

If the complexity of current substr is not O(N * M), what is that?

That’s the worst-case complexity, when the string to search contains a lot of long partial matches. If the characters have a reasonably uniform distribution, then the average complexity would be closer to O(N). So by choosing an algorithm with better worst-case complexity, you may well make more typical cases much slower.

Leave a Comment