How do I tokenize a string sentence in NLTK?

This is actually on the main page of nltk.org: >>> import nltk >>> sentence = “””At eight o’clock on Thursday morning … Arthur didn’t feel very good.””” >>> tokens = nltk.word_tokenize(sentence) >>> tokens [‘At’, ‘eight’, “o’clock”, ‘on’, ‘Thursday’, ‘morning’, ‘Arthur’, ‘did’, “n’t”, ‘feel’, ‘very’, ‘good’, ‘.’]

How to get a Token from a Lucene TokenStream?

Yeah, it’s a little convoluted (compared to the good ol’ way), but this should do it: TokenStream tokenStream = analyzer.tokenStream(fieldName, reader); OffsetAttribute offsetAttribute = tokenStream.getAttribute(OffsetAttribute.class); TermAttribute termAttribute = tokenStream.getAttribute(TermAttribute.class); while (tokenStream.incrementToken()) { int startOffset = offsetAttribute.startOffset(); int endOffset = offsetAttribute.endOffset(); String term = termAttribute.term(); } Edit: The new way According to Donotello, TermAttribute has been … Read more

Tokenizing strings in C

Do it like this: char s[256]; strcpy(s, “one two three”); char* token = strtok(s, ” “); while (token) { printf(“token: %s\n”, token); token = strtok(NULL, ” “); } Note: strtok modifies the string its tokenising, so it cannot be a const char*.

Nested strtok function problem in C [duplicate]

You cannot do that with strtok(); use strtok_r() from POSIX or strtok_s() from Microsoft if they are available, or rethink your design. char *strtok_r(char *restrict s, const char *restrict sep, char **restrict lasts); char *strtok_s(char *strToken, const char *strDelimit, char **context); These two functions are interchangeable. Note that a variant strtok_s() is specified in an … Read more

How to best split csv strings in oracle 9i

Joyce, Here are three examples: 1) Using dbms_utility.comma_to_table. This is not a general purpose routine, because the elements should be valid identifiers. With some dirty tricks we can make it work more universal: SQL> declare 2 cn_non_occuring_prefix constant varchar2(4) := ‘zzzz’; 3 mystring varchar2(2000):=’a:sd:dfg:31456:dasd: :sdfsdf’; — just an example 4 l_tablen binary_integer; 5 l_tab dbms_utility.uncl_array; … Read more

how to get data between quotes in java?

You can use a regular expression to fish out this sort of information. Pattern p = Pattern.compile(“\”([^\”]*)\””); Matcher m = p.matcher(line); while (m.find()) { System.out.println(m.group(1)); } This example assumes that the language of the line being parsed doesn’t support escape sequences for double-quotes within string literals, contain strings that span multiple “lines”, or support other … Read more