Regex how to match an optional character

Use [A-Z]? to make the letter optional. {1} is redundant. (Of course you could also write [A-Z]{0,1} which would mean the same, but that’s what the ? is there for.) You could improve your regex to ^([0-9]{5})+\s+([A-Z]?)\s+([A-Z])([0-9]{3})([0-9]{3})([A-Z]{3})([A-Z]{3})\s+([A-Z])[0-9]{3}([0-9]{4})([0-9]{2})([0-9]{2}) And, since in most regex dialects, \d is the same as [0-9]: ^(\d{5})+\s+([A-Z]?)\s+([A-Z])(\d{3})(\d{3})([A-Z]{3})([A-Z]{3})\s+([A-Z])\d{3}(\d{4})(\d{2})(\d{2}) But: do you really need … Read more

How to use regex with find command?

find . -regextype sed -regex “.*/[a-f0-9\-]\{36\}\.jpg” Note that you need to specify .*/ in the beginning because find matches the whole path. Example: susam@nifty:~/so$ find . -name “*.jpg” ./foo-111.jpg ./test/81397018-b84a-11e0-9d2a-001b77dc0bed.jpg ./81397018-b84a-11e0-9d2a-001b77dc0bed.jpg susam@nifty:~/so$ susam@nifty:~/so$ find . -regextype sed -regex “.*/[a-f0-9\-]\{36\}\.jpg” ./test/81397018-b84a-11e0-9d2a-001b77dc0bed.jpg ./81397018-b84a-11e0-9d2a-001b77dc0bed.jpg My version of find: $ find –version find (GNU findutils) 4.4.2 Copyright (C) 2007 … Read more

How do I grep for all non-ASCII characters?

You can use the command: grep –color=”auto” -P -n “[\x80-\xFF]” file.xml This will give you the line number, and will highlight non-ascii chars in red. In some systems, depending on your settings, the above will not work, so you can grep by the inverse grep –color=”auto” -P -n “[^\x00-\x7F]” file.xml Note also, that the important … Read more

How to use conditionals when replacing in Notepad++ via regex

The syntax in the conditional replacement is (?{GROUP_MATCHED?}REPLACEMENT_IF_YES:REPLACEMENT_IF_NO) The { and } are necessary to avoid ambiguity when you deal with groups higher than 9 and with named capture groups. Since Notepad++ uses Boost-Extended Format String Syntax, see this Boost documentation: The character ? begins a conditional expression, the general form is: ?Ntrue-expression:false-expression where N … Read more

Extract info inside all parenthesis in R

Here is an example: > gsub(“[\\(\\)]”, “”, regmatches(j, gregexpr(“\\(.*?\\)”, j))[[1]]) [1] “wonder” “groan” “Laugh” I think this should work well: > regmatches(j, gregexpr(“(?=\\().*?(?<=\\))”, j, perl=T))[[1]] [1] “(wonder)” “(groan)” “(Laugh)” but the results includes parenthesis… why? This works: regmatches(j, gregexpr(“(?<=\\().*?(?=\\))”, j, perl=T))[[1]] Thanks @MartinMorgan for the comment.