NReadability Sample with VB.Net and C#

Posted: 09/06/2013

NReadability is a project/tool for removing clutter from HTML pages, much like Readability.js from arc90labs or how Instapaper is able to retrieve a web page, identify the main content and then provide a basic formatted document that’s easy to read (and stripped down to the bare bones).

Here I’m going to provide a simple snippet that will show you how to read in content from a web page, get a UTF8 string from it and then write it out to a file on the hard drive (you should change the output file to somewhere that exists on your workstation). The encoding part is important for most web pages in order to render the string’s characters correctly and not have certain characters return as garble. Anyway, here’s the examples:

VB.Net

    Using wc As New WebClient
        Dim html As Byte() = wc.DownloadData("http://en.wikipedia.org/wiki/.NET_Framework")
        Dim tc As New NReadability.NReadabilityTranscoder
        Dim ti As New NReadability.TranscodingInput(System.Text.Encoding.UTF8.GetString(html))
        Dim tcr As NReadability.TranscodingResult = tc.Transcode(ti)
        System.IO.File.WriteAllText("c:temptest.html", tcr.ExtractedContent, System.Text.Encoding.Unicode)
        System.Diagnostics.Process.Start("c:temptest.html")
    End Using

C#

    using (WebClient wc = new WebClient())
    {
        byte[] html = wc.DownloadData("http://en.wikipedia.org/wiki/.NET_Framework");
        NReadability.NReadabilityTranscoder tc = new NReadability.NReadabilityTranscoder();
        NReadability.TranscodingInput ti = new NReadability.TranscodingInput(System.Text.Encoding.UTF8.GetString(html));
        NReadability.TranscodingResult tcr = tc.Transcode(ti);
        System.IO.File.WriteAllText("c:\temp\test.html", tcr.ExtractedContent, System.Text.Encoding.Unicode);
        System.Diagnostics.Process.Start("c:\temp\test.html");
    }