Why Hacking Web Pages With Unix Rules

The changes that I’ve been making have involved numerous small amendments to a site I didn’t build and that was built with tools that are now several years out of date. Whilst web building is just as easy on any platform, amending sites is considerably faster on Unix. Here’s why:

  • wget. First of all I could not log in to the FTP server. No reason, just one of those things, so I wgot the whole site (wget -m), this has the advantage of removing any redundant files.
  • tidy. This runs on most OSs, but is just the most useful tool. In one tidy run, I converted the whole site to XHTML, cleaned up any redundant or empty tags, removed all <font> tags and rewrote them to CSS and indented the code properly so that I could find my way around it and so that it was easier to match blocks of code with regular expressions.
  • grep. I would locate an error, grep the whole set of files for that error and then look at only those that matched. Using grep I could test the effectiveness of changes made with perl and generally assess what was going on with the site across 116 files with no delay. And, more importantly, no drudge work!
  • perl. Cake. Icing. Perl is fantastically useful and extremely flexible when it comes to text manipulation. Within the framework of a very simple program I built sets of regular expressions that, when applied to the current site, made the kind of changes I needed to make. These were things like, changing an icon to point somewhere else on every page, changing the header on every page to remove a banner. This kind of stuff is doable with other find/replace tools. But take into account that whilst the banner HTML was visually identical, it’s code was not. Some places would use a <br />, others a spacer Gif. Only regex’s stay ahead in that game.

The most important thing was that all my grepping and perling (the bulk of the effort) was great fun and in no way dull. Using brute force to make all these changes was certainly possible (with the exception of the tidy work) but it would have been dull as fuck. With a little programmatic magic on the other hand, I was able to skip through it, giggling like a schoolboy who’s just dropped a trojan into the staff network.

I rewarded myself by buying a book and a t-shirt.