A Conversation for Website Developer's Forum

.ASP Text Clean-up

Post 1

Bogie

I am pulling some text from an old database into an .ASP webpage. The only problem is that the text is full of random ASCII characters. I have tried cleaning the text up with:

Description = rs("Data")
For CharNumber = 0 to 31
Description = Replace(Description,Chr(CharNumber),"")
Next

... but I am still left with a couple of strange characters which are messing up my formatting. smiley - doh

Does anyone know good way to clean up my text so that all I am left with are the letters a-z, A-Z, 0-9, whitespace and the punctuation characters !"£$%^&*@<>~-?

The code must be in VBscript.

Any help would be greatfully recieved!

B.


.ASP Text Clean-up

Post 2

Bogie

A method of comparing the text in the Description against an array of permitted characters might be a good way of doing this... but I don't know how to go about writing this is VBscript.

B.


.ASP Text Clean-up

Post 3

Felonious Monk - h2g2s very own Bogeyman

I do: use the regular expression library that gets installed with VBScript: http://authors.aspalliance.com/brettb/VBScriptRegularExpressions.asp

Mastering the syntax can be a bit of a struggle, but it's worth it, as you should be able to parse any text file you want. Once you have mastered the syntax, use the Replace method to get rid of the crap.


.ASP Text Clean-up

Post 4

Bogie

That only appears to work if you know what character you want to remove. The characters I am trying to remove are just showing up as one of those funny little upright rectangles... so I don't actually know what it is. All I know is that it is some sort of formatting code (probably copied from a MS Word document).

Any further thoughts?

B.


.ASP Text Clean-up

Post 5

Felonious Monk - h2g2s very own Bogeyman

No, it works even if you *don't* know, because you can define a regexp as complement set of characters which excludes the interlopers. To exclude all except a-z, A-Z, 0-9, whitespace and the punctuation characters !"£$%^&*@<>~-?, you'd use the regexp
"[^\w\s!"£\$%\^&\*@<>~\?-]"
which matches all the single characters NOT in that set. You just replace them with the empty string:

Dim regEx
Set regEx = New RegExp ' Create regular expression.
regEx.Pattern = "[^\w\s!""£\$%\^&\*@<>~\?-]"
regEx.Global= True ' set global scope
' Make case insensitive.
result = regEx.Replace(str1, "") ' Make replacement.

A doddle AND it's quick to run. Try it and let me know if it works. You can see the syntax of regular expressions here:
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/script56/html/vspropattern.asp


.ASP Text Clean-up

Post 6

Felonious Monk - h2g2s very own Bogeyman

Well? Did it work? smiley - erm


Key: Complain about this post

Write an Entry

"The Hitchhiker's Guide to the Galaxy is a wholly remarkable book. It has been compiled and recompiled many times and under many different editorships. It contains contributions from countless numbers of travellers and researchers."

Write an entry
Read more