Filter spark DataFrame on string contains

You can use contains (this works with an arbitrary sequence):

df.filter($"foo".contains("bar"))

like (SQL like with SQL simple regular expression whith _ matching an arbitrary character and % matching an arbitrary sequence):

df.filter($"foo".like("bar"))

or rlike (like with Java regular expressions):

df.filter($"foo".rlike("bar"))

depending on your requirements. LIKE and RLIKE should work with SQL expressions as well.

Leave a Comment