Remove HTML from a string (VB.Net & C#)

Posted by Blake on 12/21/2014
)

Here is a simple extension method (provided in both VB and C#) that uses a regular expression to remove HTML from a string.  This has worked well in every case I’ve used it in so far.

VB.Net

 
''' <summary>
''' Removes HTML from a string. 
''' </summary>
<Extension()> _
Public Shared Function RemoveHtml(html As String) As String
	html = Regex.Replace(html, "<(.|\n)*?>", String.Empty)
	html = html.Replace(vbTab, " ")
	html = html.Replace(vbCrLf, String.Empty)
	html = html.Replace("   ", " ")
	Return html.Replace("  ", " ")
End Function

C#

/// <summary>
/// Removes HTML from a string. 
/// </summary>
public static string RemoveHtml(string html)
{
	html = Regex.Replace(html, "<(.|\\n)*?>", string.Empty);
	html = html.Replace("\t", " ");
	html = html.Replace("\r\n", string.Empty);
	html = html.Replace("   ", " ");
	return html.Replace("  ", " ");
} // end RemoveHtml