What is the {L} Unicode category?

Taken from this link: http://www.regular-expressions.info/unicode.html Check the Unicode Character Properties section. \p{L} matches a single code point in the category “letter”. If your input string is à encoded as U+0061 U+0300, it matches a without the accent. If the input is à encoded as U+00E0, it matches à with the accent. The reason is that … Read more

Parse html using C

You want to use HTML tidy to do this. The Lib curl page has some source code to get you going. Documents traversing the dom tree. You don’t need an xml parser. Doesn’t fail on badly formated html. http://curl.haxx.se/libcurl/c/htmltidy.html

Find and replace nth occurrence of [bracketed] expression in string

Here is another possible solution. You can pass the string.replace function a function to determine what the replacement value should be. The function will be passed three arguments. The first argument is the matching text, the second argument is the position within the original string, and the third argument is the original string. The following … Read more

Java Regex matching between curly braces

you need to escape ‘{‘ & ‘}’ with a ‘\’ so: “{(.*?)}” becomes: “\\{(.*?)\\}” where you have to escape the ‘\’ with another ‘\’ first see: http://www.regular-expressions.info/reference.html for a comprehensive list of characters that need escaping…