Removing smartquotes from text in Linux

When converting documents from customer word format to html it seems inevitable that there will be extra characters provided by Microsoft Word. Unfortunately this expanded character set, which includes Smart Quotes, is not supported by many web browsers and email clients, so we have to go through and clean all of this out of our html files.

I normally use vim to edit all html files, and I have finally found a reliable method of locating these ‘bad characters’ in vim.

The process consists of two steps:

1) Make sure the ‘file encoding’ is 8-bit

:setlocal fenc=latin1

2) Use the 8g8 command in Normal mode (see “help 8g8”)

This process allows the bad characters to be identified and converted to utf-8 characters that can be ed in all web browsers and email clients. If anyone out there has a better/easier way of doing this, please let me know.

Vim – Multiple Files

One of the very cool things about Vim is the ease of editing multiple files. I like to use this capability when working on a website project and need to update lots of html files with something common, like a new link. Sure, sometimes there are other methods of doing this, like a simple command line search and replace perl script . Sometimes though, the regular expression is just too much of a pain to write, and it’s just easier, and faster, to edit all of the files.

The difficulty is, without menus, it can be tough to remember the syntax to move between files. In case you, like me, can’t remember how to edit multiple files, here is a little tutorial.

  1. Open multiple files with vim vim *.html
  2. Edit the first file
  3. Type :bn to move to the next file in the buffer
  4. Repeat

If you need additional command information, check out this great list of vim tips.

Learning the vi and Vim Editors