Modified TFMail
About TFMail
TFMail is a CGI script maintained by the London Perl Mongers which acts as an HTML-form-to-email-gateway. The HTML form is submitted via the CGI script; CGI script emails result to someone. It is more flexible than the classic "FormMail", and more secure than many common "FormMail" scripts. It has a templating system, supports multiple configuration files, and... oh, just go visit the site .
About My Modifications
I use TFMail, on this site and elsewhere. In the course of that, I have made some very simple modifications:
- Adds measures against form spam
- Changes character set to UTF-8
- Accepts IPv6 client addresses
- Converts literal " " character sequences to newlines
More detail below.
Help and Support
To be explicit: The London Perl Mongers, and/or the TFMail user community, generally do not support modified scripts. So if you have troubled with something unrelated to my modifications, please try a "vanilla" TFMail first. If the problem is with my modifications, at the very least, say so up-front in any problem reports.
Also to be explicit: I am not in the business of providing general TFMail support for free. If the problem is not with my modifications, please seek help in an appropriate forum, such as the nms-cgi-support list.
All that said, if you have a bug report or other feedback on my modifications, please let me know.
Obtaining the Patch
On this page I offer for download:
- TFmail-1.38-bscott2.diff - Diff against TFMail version 1.38
- TFmail-1.38-bscott2.zip
- PKZip archive containing
- Modified TFmail.pl, ready-to-run
- Above-mentioned diff
- Example files
In all cases, you will need to provide the rest of the TFMail distribution yourself. I suggest installing the pristine version first, and configuring it and getting it working. Then apply my modifications. That way, if you encounter problems, you will know if it was my changes that did it.
Details
Form Spam
Spam hits everything, including HTML forms. Most of it comes from fairly dumb robots (bots), so fairly simple countermeasures can be effective.
Rejection by Pattern
- Config file directive: reject_regex_fields
- Syntax: Field names, separated by commas
- Semantics: If any listed field matches the pattern given in reject_regex, the submission is rejected as spam.
- Config file directive: reject_regex
- Syntax: Perl-compatible regular expression
- Semantics: Compared against the field(s) given by reject_regex_fields. Matching content rejects the submission.
One can declare a search pattern, and field(s) to check it against. If any of the field(s) match the pattern, the submission is rejected as spam.
The pattern is fed directly to the Perl m// operator, so you can use any Perl regular expression (regex). By the same token, if you give an invalid pattern, Perl is likely to abort with a runtime error, which may result in an HTTP 500 Server Error or similar CGI failure.
The pattern need not be a sophisticated expression. A plain word will match any occurrence of that word, and that is often sufficient. The sample file example.html uses the literal value "spam" as a radio button choice. The config file example.trc then just uses a pattern of "spam" to look for matches.
If you need to embed spaces or other "special" characters in the pattern, neuter them using Perl backslash escapes (such as \s for whitespace, or \x3c for <). Otherwise they are likely to confuse the config file parser.
Trap Fields
- Config file directive: trapfield
- Syntax: Field names, separated by commas
- Semantics: If any listed field is submitted non-empty, the submission is rejected as spam.
Trap fields which are form fields which should never contain any content. Include a trap field in your HTML form, clearly marked (to the human reader) to be left empty. Bots generally don't read English, but people can. So the user will know to leave the field empty. Bots will stuff it full of crap, identifying them as unwanted.
At one time, I also recommended hiding the trap fields with CSS. It seems some spammers have caught on to this, and will now ignore fields tagged as hidden. So now we have to depend on the user reading the instructions.
Reject HTML
NOTE: Spammers seem to have gotten away from the habit of putting HTML in weird places, so this feature is not as useful as it once was. Further, the reject_regex is a generalized version of this feature, and could be used to replace it. As such, this feature is kept only for backwards compatibility.
- Config file directive: reject_html
- Syntax: Field names, separated by commas
- Semantics: If any listed field contains HTML, the submission is rejected as spam.
At one time, most of the spam submissions included something like a URL in an HTML anchor tag. So I added a directive which allows you to declare field(s) to never contain HTML. If they do, the submission is rejected as spam. (The implementation doesn't actually reject all HTML -- it just looked for the very simple pattern most spam had at the time.)
Rejection
Rejection with Template
- Config file directive: spam_template
- Syntax: Template base file name
- Semantics: As for other TFMail templates
When the script decides to reject a submission as spam, it processes the designated spam template and sends that to the user. In addition to all the usual template fields that would be available on a successful submission, there is an additional "spam_field" collection, which works like the "input_field" collection. You can thus FOREACH over the fields identified as spam.
I recommend using a vague message for normal use. You don't want to tell the spammers exactly what they need to attack. A example of one such template is included as "spam_vague.trt" in the ZIP file. You will need to change the phone numbers.
For testing/diagnostic purposes, it may useful to have an exact listing of which fields were rejected. The "spam_diag.trt" in the ZIP file is an example. Again, I do not recommend being that specific in regular use.
Rejection with Redirect
- Config file directive: spam_redirect
- Syntax: URL
- Semantics: As for other TFMail redirect directives
This is an alternative to "spam_template". When the script decides to reject a submission as spam, it issues an HTTP redirect to the given URL.
Logging
- Config file directive: spamlog
- Syntax: Base name of log file
- Semantics: File to log info about submissions rejected as spam. No logging if not defined.
It may be useful to be able to see what is being rejected as spam, especially during initial setup, or if you have trouble reports. This directive will cause all rejected submissions to be logged to the specified file. Just specify the base name for the file. The extension (TFMail LOGFILE_EXT) will be automatically added. It will be written to the TFMail LOGFILE_ROOT directory.
Log format is one entry per nominal line, and includes date, time, HTTP client IP address, HTTP VIA header, HTTP User Agent header, and form fields. In that order. Things the client submits are wrapped in angle brackets (<>). Note that if submission includes newlines or other nasty characters, they will be included in the log file, so don't blindly trust it or throw it at a simple parser. (This should perhaps be changed.)
Newlines
We would sometimes get form submissions containing the literal string " " (without the quotes) (10 characters). That's obviously an HTML escape sequence for CR+LF, but what was that doing in the email? I have no idea*, but the customer didn't like it. So I added a quick pattern replacement that turns that sequence into a newline.
(* I actually do have an idea: Testing revealed that, when submitting a multi-line response (TEXTAREA) via POST, Firefox and MSIE seemed to encode things differently. Whether one or the other or both or neither was "correct", I didn't bother researching. It needed to be compensated for.)
IPv6
TFMail verifies that the REMOTE_ADDR CGI header contains something that looks like a valid IP address. When TFMail 1.38 was released, IPv6 was still fairly rare, and so TFMail only accepts IPv4 format. Today IPv6 is at least unremarkable. If a client using IPv6 submits a form to stock TFMail, it will abort with the message "weird remote_addr". I have modified the check to accept IPv6 addresses as well.
Example Files
Name | Description |
---|---|
example.html | HTML web page, with form submission to TFMail. This illustrates multiple anti-spam techniques: The use of a trapfield labeled for the user, and the use of a radio button group checked against a reject regex pattern. |
example.trc | TFMail config file corresponding to example.html. It demonstrates the spam_template, trapfield, reject_regex, reject_regex_fields, and spamlog directives. |
spam_vague.trt | TFMail HTML template, suitable for use as a spam rejection notice. See Rejection with Template. |
spam_diag.trt | TFMail HTML template, suitable for diagnostic use during initial set-up or trouble-shooting. Not for regular use. See Rejection with Template. |
History
Version | Released | Remarks |
---|---|---|
1.38-bscott1 | 2009 AUG 03 | Initial release. Newline fixup. Trap fields. HTML rejection. |
1.38-bscott2 | 2024 June 27 | UTF-8. IPv6. Reject by arbitrary regex pattern. |
Long ago, I posted my patch to the nms-cgi-devel list, but got no response. Not even a "That's not what we're looking for". I dunno if that list was just dead or what.