Removing smartquotes from text in Linux

When converting documents from customer word format to html it seems inevitable that there will be extra characters provided by Microsoft Word. Unfortunately this expanded character set, which includes Smart Quotes, is not supported by many web browsers and email clients, so we have to go through and clean all of this out of our html files.

I normally use vim to edit all html files, and I have finally found a reliable method of locating these ‘bad characters’ in vim.

The process consists of two steps:

1) Make sure the ‘file encoding’ is 8-bit

:setlocal fenc=latin1

2) Use the 8g8 command in Normal mode (see “help 8g8”)

This process allows the bad characters to be identified and converted to utf-8 characters that can be ed in all web browsers and email clients. If anyone out there has a better/easier way of doing this, please let me know.