Friday, November 23, 2007

Stripping HTML Markup and Extra Whitespace from Strings

Needed a simple way of stripping HTML markup from a given string. Since this is a ColdFusion app, I ended up finding a useful blog entry at Ray Camden's blog: Quick example of cleaning up Verity results.

This is the gist of it:
<cfset var cleaned = rereplace(arguments.input, "<.*?>", "", "all")>
<cfset cleaned = rereplace(cleaned, "<.*?$", "", "all")>
<cfset cleaned = rereplace(cleaned, "^.*?>", "", "all")>
In addition, to get rid of extra white spaces, you can do this:

<cfset var cleaned = rereplace(arguments.input, "\s{2,}", " ", "all")>
The above regex would get rid of 2 or more white spaces from the text (since we got plenty of those once the markup was stripped from the original string.

Finally, to get the lovely formatting of code as seen above, check out http://formatmysourcecode.blogspot.com/. Learning on regular expressions? Go to http://www.regular-expressions.info/.

No comments: