Java regex for support Unicode?

What you are looking for are Unicode properties. e.g. \p{L} is any kind of letter from any language So a regex to match such a Chinese word could be something like \p{L}+ There are many such properties, for more details see regular-expressions.info Another option is to use the modifier Pattern.UNICODE_CHARACTER_CLASS In Java 7 there is … Read more

UTF-8 file output in R

The problem is due to some R-Windows special behaviour (using the default system coding / or using some system write functions; I do not know the specifics but the behaviour is actually known) To write text UTF8 encoding on Windows one has to use the useBytes=T options in functions like writeLines or readLines: txt <- … Read more