Strip out HTML tags

November 7th, 2012 § 0 comments

I’ve struggled for some time to find a satisfactory method for stripping large amounts of HTML from a text file and I’m happy to say I have found the perfect solution. I used VIM and a regular expression, neither of which are intended for children, to strip thousands of tags from a 1.2 MB file in less than 3 seconds. Without further ado here is the VIM command:

:0,$s/<\_.\{-1,\}>//g

It’s a bit intimidating to the uninitiated so here is step by step process.

  1. Download and install VIM
  2. Open VIM and then open the file you want to edit
  3. Copy the following to your clipboard (same as above but without the colon):
    0,$s/<\_.\{-1,\}>//g
  4. Go back to VIM and use the keyboard to type a colon (:). This puts VIM into execute mode and you will see your cursor blinking in the lower left corner
  5. Paste what you copied above. You may need to use shift+insert (fn+apple+v on mac)
  6. Hit enter
  7. Pick up jaw from floor


Follow me on twitter.

Leave a Reply

What's this?

You are currently reading Strip out HTML tags at Thomas Paine Rants.

meta