string-matching
python – regex search and findall
Ok, I see what’s going on… from the docs: If one or more groups are present in the pattern, return a list of groups; this will be a list of tuples if the pattern has more than one group. As it turns out, you do have a group, “(\d+,?)”… so, what it’s returning is the … Read more
Remove ends of string entries in pandas DataFrame column
I think you can use str.replace with regex .txt$’ ( $ – matches the end of the string): import pandas as pd df = pd.DataFrame({‘A’: {0: 2, 1: 1}, ‘C’: {0: 5, 1: 1}, ‘B’: {0: 4, 1: 2}, ‘filename’: {0: “txt.txt”, 1: “x.txt”}}, columns=[‘filename’,’A’,’B’, ‘C’]) print df filename A B C 0 txt.txt 2 … Read more
Search for string allowing for one mismatch in any location of the string
Before you read on, have you looked at biopython? It appears that you want to find approximate matches with one substitution error, and zero insertion/deletion errors i.e. a Hamming distance of 1. If you have a Hamming distance match function (see e.g. the link provided by Ignacio), you could use it like this to do … Read more
Check whether a string contains a substring
To find out if a string contains substring you can use the index function: if (index($str, $substr) != -1) { print “$str contains $substr\n”; } It will return the position of the first occurrence of $substr in $str, or -1 if the substring is not found.
Find the Number of Occurrences of a Substring in a String
How about using StringUtils.countMatches from Apache Commons Lang? String str = “helloslkhellodjladfjhello”; String findStr = “hello”; System.out.println(StringUtils.countMatches(str, findStr)); That outputs: 3
R fuzzy string match to return specific column based on matched string
You are 90% of the way there… You say you want to know with which row of data the string was matched from df2 You just need to understand the code you already have. See ?amatch: amatch returns the position of the closest match of x in table. When multiple matches with the same smallest … Read more
Finding how similar two strings are
Ok, so the standard algorithms are: 1) Hamming distance Only good for strings of the same length, but very efficient. Basically it simply counts the number of distinct characters. Not useful for fuzzy searching of natural language text. 2) Levenstein distance. The Levenstein distance measures distance in terms of the number of “operations” required to … Read more
Find numbers after specific text in a string with RegEx
Try this expression: “Error importing row no\. (\d+):” DEMO Here you need to understand the quantifiers and escaped sequences: . any character; as you want only numbers, use \d; if you meant the period character you must escape it with a backslash (\.) ? Zero or one character; this isn’t what do you want, as … Read more