HTML to List using XMLWorker

You need to implement the IElementHandler interface in a class of your own:

public class SampleHandler : IElementHandler {
    //Generic list of elements
    public List<IElement> elements = new List<IElement>();
    //Add the supplied item to the list
    public void Add(IWritable w) {
        if (w is WritableElement) {
            elements.AddRange(((WritableElement)w).Elements());
        }
    }
}

Instead of using the file stream here’s an example parsing a string. To use a file replace the StringReader with a StreamReader.

    string html = "<html><head><title>Test Document</title></head><body><p>This is a test. <strong>Bold <em>and italic</em></strong></p><ol><li>Dog</li><li>Cat</li></ol></body></html>";
    //Instantiate our handler
    var mh = new SampleHandler();
    //Bind a reader to our text
    using (TextReader sr = new StringReader(html)) {
        //Parse
        XMLWorkerHelper.GetInstance().ParseXHtml(mh, sr);
    }

    //Loop through each element
    foreach (var element in mh.elements) {
        //Loop through each chunk in each element
        foreach (var chunk in element.Chunks) {
            //Do something
        }
    }

Leave a Comment