How to find all permutations of a given word in a given text?

This is probably not the most efficient solution algorithmically, but it is clean from a class design point of view. This solution takes the approach of comparing “sorted” given words.

We can say that a word is a permutation of another if it contains the same letters in the same number. This means that you can convert the word from a String to a Map<Character,Integer>. Such conversion will have complexity O(n) where n is the length of the String, assuming that insertions in your Map implementation cost O(1).

The Map will contain as keys all the characters found in the word and as values the frequencies of the characters.

Example. abbc is converted to [a->1, b->2, c->1]

bacb is converted to [a->1, b->2, c->1]

So if you have to know if two words are one the permutation of the other, you can convert them both into maps and then invoke Map.equals.

Then you have to iterate over the text string and apply the transformation to all the substrings of the same length of the words that you are looking for.

Improvement proposed by Inerdial

This approach can be improved by updating the Map in a “rolling” fashion.

I.e. if you’re matching at index i=3 in the example haystack in the OP (the substring xya), the map will be [a->1, x->1, y->1]. When advancing in the haystack, decrement the character count for haystack[i], and increment the count for haystack[i+needle.length()].

(Dropping zeroes to make sure Map.equals() works, or just implementing a custom comparison.)

Improvement proposed by Max

What if we also introduce matchedCharactersCnt variable? At the beginning of the haystack it will be 0. Every time you change your map towards the desired value – you increment the variable. Every time you change it away from the desired value – you decrement the variable. Each iteration you check if the variable is equal to the length of needle. If it is – you’ve found a match. It would be faster than comparing the full map every time.

Pseudocode provided by Max:

needle = "abbc"
text = "abbcbbabbcaabbca"

needleSize = needle.length()
//Map of needle character counts
targetMap = [a->1, b->2, c->1]

matchedLength = 0
curMap = [a->0, b->0, c->0]
//Initial map initialization
for (int i=0;i<needle.length();i++) {
    if (curMap.contains(haystack[i])) {
        matchedLength++
        curMap[haystack[i]]++
    }
}

if (matchedLength == needleSize) {
    System.out.println("Match found at: 0");
}

//Search itself
for (int i=0;i<haystack.length()-needle.length();i++) {
    int targetValue1 = targetMap[haystack[i]]; //Reading from hashmap, O(1)
    int curValue1 = curMap[haystack[i]]; //Another read
    //If we are removing beneficial character
    if (targetValue1 > 0 && curValue1 > 0 && curValue1 <= targetValue1) {       
        matchedLength--;
    }
    curMap[haystack[i]] = curValue1 + 1; //Write to hashmap, O(1)


    int targetValue2 = targetMap[haystack[i+needle.length()]] //Read
    int curValue2 = curMap[haystack[i+needle.length()]] //Read
    //We are adding a beneficial character
    if (targetValue2 > 0 && curValue2 < targetValue2) { //If we don't need this letter at all, the amount of matched letters decreases
        matchedLength++;
    }
    curMap[haystack[i+needle.length()]] = curValue2 + 1; //Write

    if (matchedLength == needleSize) {
        System.out.println("Match found at: "+(i+1));
    }
}

//Basically with 4 reads and 2 writes which are 
//independent of the size of the needle,
//we get to the maximal possible performance: O(n)

Leave a Comment