A Conversation for Website Developer's Forum
.ASP Text Clean-up
Bogie Started conversation Jul 1, 2004
I am pulling some text from an old database into an .ASP webpage. The only problem is that the text is full of random ASCII characters. I have tried cleaning the text up with:
Description = rs("Data")
For CharNumber = 0 to 31
Description = Replace(Description,Chr(CharNumber),"")
Next
... but I am still left with a couple of strange characters which are messing up my formatting.
Does anyone know good way to clean up my text so that all I am left with are the letters a-z, A-Z, 0-9, whitespace and the punctuation characters !"£$%^&*@<>~-?
The code must be in VBscript.
Any help would be greatfully recieved!
B.
.ASP Text Clean-up
Bogie Posted Jul 1, 2004
A method of comparing the text in the Description against an array of permitted characters might be a good way of doing this... but I don't know how to go about writing this is VBscript.
B.
.ASP Text Clean-up
Felonious Monk - h2g2s very own Bogeyman Posted Jul 1, 2004
I do: use the regular expression library that gets installed with VBScript: http://authors.aspalliance.com/brettb/VBScriptRegularExpressions.asp
Mastering the syntax can be a bit of a struggle, but it's worth it, as you should be able to parse any text file you want. Once you have mastered the syntax, use the Replace method to get rid of the crap.
.ASP Text Clean-up
Bogie Posted Jul 2, 2004
That only appears to work if you know what character you want to remove. The characters I am trying to remove are just showing up as one of those funny little upright rectangles... so I don't actually know what it is. All I know is that it is some sort of formatting code (probably copied from a MS Word document).
Any further thoughts?
B.
.ASP Text Clean-up
Felonious Monk - h2g2s very own Bogeyman Posted Jul 2, 2004
No, it works even if you *don't* know, because you can define a regexp as complement set of characters which excludes the interlopers. To exclude all except a-z, A-Z, 0-9, whitespace and the punctuation characters !"£$%^&*@<>~-?, you'd use the regexp
"[^\w\s!"£\$%\^&\*@<>~\?-]"
which matches all the single characters NOT in that set. You just replace them with the empty string:
Dim regEx
Set regEx = New RegExp ' Create regular expression.
regEx.Pattern = "[^\w\s!""£\$%\^&\*@<>~\?-]"
regEx.Global= True ' set global scope
' Make case insensitive.
result = regEx.Replace(str1, "") ' Make replacement.
A doddle AND it's quick to run. Try it and let me know if it works. You can see the syntax of regular expressions here:
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/script56/html/vspropattern.asp
Key: Complain about this post
.ASP Text Clean-up
More Conversations for Website Developer's Forum
Write an Entry
"The Hitchhiker's Guide to the Galaxy is a wholly remarkable book. It has been compiled and recompiled many times and under many different editorships. It contains contributions from countless numbers of travellers and researchers."