split
How to split a string into tokens in C?
strtok / strtok_r char *token; char *state; for (token = strtok_r(input, “&”, &state); token != NULL; token = strtok_r(NULL, “&”, &state)) { … }
Java’s Scanner vs String.split() vs StringTokenizer; which should I use?
Did some metrics around these in a single threaded model and here are the results I got. ~~~~~~~~~~~~~~~~~~Time Metrics~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~ Tokenizer | String.Split() | while+SubString | Scanner | ScannerWithCompiledPattern ~ ~ 4.0 ms | 5.1 ms | 1.2 ms | 0.5 ms | 0.1 ms ~ ~ 4.4 ms | 4.8 ms | 1.1 ms … Read more
How do you split a javascript string by spaces and punctuation?
To split a str on any run of non-word characters I.e. Not A-Z, 0-9, and underscore. var words=str.split(/\W+/); // assumes str does not begin nor end with whitespace Or, assuming your target language is English, you can extract all semantically useful values from a string (i.e. “tokenizing” a string) using: var str=”Here\”s a (good, bad, … Read more
Python split string in moving window
The itertools examples provides the window function that does just that: from itertools import islice def window(seq, n=2): “Returns a sliding window (of width n) over data from the iterable” ” s -> (s0,s1,…s[n-1]), (s1,s2,…,sn), … ” it = iter(seq) result = tuple(islice(it, n)) if len(result) == n: yield result for elem in it: result … Read more
Splitting a string in C#
Use the Regex.Matches method instead: string[] result = Regex.Matches(str, @”\[.*?\]”).Cast<Match>().Select(m => m.Value).ToArray();
Unicode string with diacritics split by chars
To do this properly, what you want is the algorithm for working out the grapheme cluster boundaries, as defined in UAX 29. Unfortunately this requires knowledge of which characters are members of which classes, from the Unicode Character Database, and JavaScript doesn’t make that information available(*). So you’d have to include a copy of the … Read more
split file on Nth occurrence of delimiter
Using awk you could: awk ‘/^\+$/ { delim++ } { file = sprintf(“chunk%s.txt”, int(delim / 50000)); print >> file; }’ < input.txt Update: To not include the delimiter, try this: awk ‘/^\+$/ { if(++delim % 50000 == 0) { next } } { file = sprintf(“chunk%s.txt”, int(delim / 50000)); print > file; }’ < input.txt … Read more
String.Split VS. Regex.Split?
Regex.Split is more capable, but for an arrangement with basic delimitting (using a character that will not exist anywhere else in the string), the String.Split function is much easier to work with. As far as performance goes, you would have to create a test and try it out. But, don’t pre-optimize, unless you know that … Read more
How can split a string which contains only delimiter?
Alnitak is correct that trailing empty strings will be discarded by default. If you want to have trailing empty strings, you should use split(String, int) and pass a negative number as the limit parameter. The limit parameter controls the number of times the pattern is applied and therefore affects the length of the resulting array. … Read more