Classic ASP (VBScript) convert HTML codes to plain text

What you need is HTML Decode, though unfortunately ASP doesn’t include one.

This function, found on ASP Nut, and modified heavily by me, should do what you need. I tested it as vbscript running on my local computer and it seemed to work well, even with Unicode symbols in the 1000+ range.

Function HTMLDecode(sText)
    Dim regEx
    Dim matches
    Dim match
    sText = Replace(sText, """, Chr(34))
    sText = Replace(sText, "<"  , Chr(60))
    sText = Replace(sText, ">"  , Chr(62))
    sText = Replace(sText, "&" , Chr(38))
    sText = Replace(sText, " ", Chr(32))


    Set regEx= New RegExp

    With regEx
     .Pattern = "&#(\d+);" 'Match html unicode escapes
     .Global = True
    End With

    Set matches = regEx.Execute(sText)

    'Iterate over matches
    For Each match in matches
        'For each unicode match, replace the whole match, with the ChrW of the digits.

        sText = Replace(sText, match.Value, ChrW(match.SubMatches(0)))
    Next

    HTMLDecode = sText
End Function

Note: You’ll need script version 5.0 installed on your server to use the RegExp object.

Leave a Comment