As of 2.2.8, there's a new module that comes standard with Apache HTTPd called mod_substitute. As soon as I upgraded to 2.2.8, I found an immediate use for mod_substitute.
mod_substitute lets you apply a regular expression to change data as it gets sent out to the client. Another module, mod_line_edit, has been around for a while and also does this. This is the first time it's been part of the standard Apache distro.
We're using TinyMCE to edit content on our website. Turns out that Internet Explorer doesn't honor align="left" and align="right" on images, so we had to stick in style="float: left" and style="float: right" instead. But getting TinyMCE to do this was difficult since I don't speak JavaScript fluently enough to make it do that. We created a class "float-left" and another "float-right" for this.
So, using mod_substitute, I did:
<Location />
AddOutputFilterByType SUBSTITUTE text/html
Substitute "s!(<img)(class=\"float-(left|right)\")?([^>]*?)align=\"(left|right)\"([^>]*?)(class=\"float-(right|left)\")?([^>]*>)!$1$4align=\"$5\" class=\"float-$5\"$6$9!i"
</Location>
Yes, that's more complicated than it should be, because once you start editing and re-editing, the class="float-*" ends up in the edited HTML, and we have to take it back off again. Sheesh.
But, that's not all. When we load the content back into the editor, we have to take all that stuff off:
<LocationMatch (/admin/pages.php|/press/edit)>
AddOutputFilterByType SUBSTITUTE text/html
Substitute "s!<(img.*?)class=\"float-(left|right)\"(.*?)>!<$1$3>!i"
</LocationMatch>
Cool, hmm?
Not cool.
David,
Hmm? How so?
The on-going trend of Apache to be the substitute for other entities is disturbing. One of the things I *really* like about Unix/Linux and the applications that historically run on them is their modularity and their ability to leave other things alone.
The mod-* modules are a good example of cross-over between systems-administration and programming/design.
These modules slow down performance.
These modules hide important work that should be handled elsewhere.
It smells of the computing philosophy that the person sitting behind the keyboard (developer/designer) is incompetent, thus the server (and server administrator) must manage them on their behalf. This philosophy is contrary to the philosophy that Linux/Unix/Mainframes and the like are built on - one that the person behind the keyboard knows what they are doing and the service should monitor the service and leave the user and their data alone.
I find myself utterly unable to guess what you're talking about.
You say that modularity is a good thing, and then in the next paragraph criticize modularity. I don't know quite what you're getting at.
mod_substitute is like awk, like sed, like xargs. The Apache filter chain concept is just like its Unix command-line counterpart, the pipe. It allows you to produce output, pipe it through something else, pipe the results through something else, and do something utterly else with the results of that.
ps ax | grep http | awk '{print $1}' | xargs -l1 kill -9
I utterly fail to see how this in any way indicates that anybody is incompetent. It merely means that I wish to massage the output a little before sending it to the client.
What makes Apache HTTPd powerful is exactly what makes Unix powerful - the ability to only use the bits that I need, and leave off everything else. I'm perplexed that you'd say that you like this about Unix, and then criticize Apache for the same thing.
mod_substitute is doing a very similar thing to what mod_rewrite does, just at a different layer.
I believe I understand what you are talking about that this functionality is very similar to the piping and text manipulation of the tools you mentioned (awk, | , sed, etc).
These other tools are running in Linux or Unix. They are other than Linux or Unix. Which is their proper place. If these tools were built into the kernel, then it would be the improper place.
I do not criticize modularity. I am criticizing Apache's choice of meddling in things its fingers don't belong meddling in. It is not the job of the server to manipulate the data. That is the job of the programmer, designer, etc. to produce output that the server then delivers.
Next, you will tell me that it is going to start doing its own DNS lookups or sending its own mail. Where does it end?
Well, then, you might as well jettison mod_php and mod_perl, too, and just use mod_cgi, which calls an external process as K&R intended.
This seems like a pretty big philosophy disconnect between you and the way that most websites operate. Most web development frameworks assume that it is, in fact, the job of the server to manipulate the data.
I happen to agree with both of you.
mod_substitute is a nifty little module that can be used to correct problems in web pages.
These problems should be corrected at the source - TinyMCE should be configurable enough to replace the align left/right with appropriate classes.
There are two solutions, dive into JS and TinyMCE, update the source code to create classes instead of aligns, and remove mod_substitute; or, correct the problem using mod_substitute.
In some cases correcting the source of the problem is either not possible or too difficult.
I, too, have started to look into using mod_substitute to correct a 'feature' of a CMS that I do not want publicly visible. I could correct it in the CMS, but this causes further problems I would like to avoid.
Paddy.
I agree that mod_substitute really incredibly useful. In my blog at http://www.willswebworks.com/blog/2008/08/convert-email-addresses-in-source-html-to-images-without-modifying-the-source/, I explain how I devised a method to output plain-text email addresses in HTML source as images of those email addresses when Apache web server outputs the HTML to the clients browser.
It takes advantage of a new feature very recently released in Apache 2.2.7. A regular expression substitution is performed on HTML to find email addresses and replace that code with an tag that refers to a tiny PHP script that generates the image.
My example uses an image, but you can replace the email address with any spam email harvester obfuscation method you'd like by putting the appropriate code as the replacement value of the regular expression.
The big deal? This method requires NO modification of the existing HTML code whatsoever. HUGE time saver, and this method can be propagated throughout an unlimited number of virtual servers running under the Apache server.