What is a regular expression for parsing out individual sentences?

Try this @"(\S.+?[.!?])(?=\s+|$)":

string str=@"Hello world! How are you? I am fine. This is a difficult sentence because I use I.D.
Newlines should also be accepted. Numbers should not cause sentence breaks, like 1.23.";

Regex rx = new Regex(@"(\S.+?[.!?])(?=\s+|$)");
foreach (Match match in rx.Matches(str)) {
    int i = match.Index;
    Console.WriteLine(match.Value);
}

Results:

Hello world!
How are you?
I am fine.
This is a difficult sentence because I use I.D.
Newlines should also be accepted.
Numbers should not cause sentence breaks, like 1.23.

For complicated ones, of course, you will need a real parser like SharpNLP or NLTK. Mine is just a quick and dirty one.

Here is the SharpNLP info, and features:

SharpNLP is a collection of natural
language processing tools written in
C#. Currently it provides the
following NLP tools:

  • a sentence splitter
  • a tokenizer
  • a part-of-speech tagger
  • a chunker (used to “find non-recursive syntactic annotations such as noun phrase chunks”)
  • a parser
  • a name finder
  • a coreference tool
  • an interface to the WordNet lexical database

Leave a Comment