I’ve struggled for some time to find a satisfactory method for stripping large amounts of HTML from a text file and I’m happy to say I have found the perfect solution. I used VIM and a regular expression, neither of which are intended for children, to strip thousands of tags from a 1.2 MB file in less than 3 seconds. Without further ado here is the VIM command:
:0,$s/<\_.\{-1,\}>//g
It’s a bit intimidating to the uninitiated so here is step by step process.
- Download and install VIM
- Open VIM and then open the file you want to edit
- Copy the following to your clipboard (same as above but without the colon):
0,$s/<\_.\{-1,\}>//g
- Go back to VIM and use the keyboard to type a colon (:). This puts VIM into execute mode and you will see your cursor blinking in the lower left corner
- Paste what you copied above. You may need to use shift+insert (fn+apple+v on mac)
- Hit enter
- Pick up jaw from floor
Follow me on twitter.
Leave a Reply