Securing HTML forms from spammers
As we all know, HTML-based forms are used for the primary interaction between web sites’ owners and visitors. Since the ‘good old days’ when there were no spam bots are definitely over, every dynamic web page must be protected with latest algorithms and technologies. Spammers work very hard when developing new strategies to penetrate protective components, and they are developing automated or semi-automated tools which have only one goal: publishing URLs or email addresses on pages which are visible by internet search engines and visitors. Such actions lead to abuse and problems for website owners.
First, we must understand why URLs are being inserted into HTML forms. Nowadays, when many web sites depend on the income generated by online sales, everyone wants his site to be ranked on top positions on all major search engines. After studying several SEO techniques, we realise that link-building is a very important part of every optimization campaign. The easiest way to acquire links pointing to any web site is to spam forums and visitor-books. Unfortunately, our problems only start here. Since it’s easy to download and install an existing web community engine, people don’t spend time and money on developing their own solutions which can be easily secured from spammers. This is the major reason why web communities are primary targets. We shouldn’t forget about many visitor-books which have been developed in order to allow web owners to communicate with their visitors. As you can see, these are good ideas, but they have been abused.
Collecting email addresses became a very lucrative business. It might sound strange, but there are people who sell email addresses, and people who buy huge lists of them. Of course, it’s a highly questionable way of making a profit, because the addresses are invariably used for spamming.
Thirdly and lastly, spam bots exist to test a system’s vulnerability. Once a bot finds a weak HTML form, it is reported and shared with other bots. Unfortunately, general information about available forms are shared too.
But how to avoid spammers who manage to break your security?
Image verification has been used for a long time, and it has proven to be very efficient weapon for spam prevention purposes. However, as time goes by there are techniques which render it obsolete. First of all, every HTML form transfers values which may be hidden or visible to human visitor. But a spam bot can see all values and variables equally. This means there is no way to use images safely. The same thing applies to text verification techniques and others. The basic question remains: how can you secure your form?
The only way how to keep spammers away from posting unwanted messages to your forums or visitor-books is to use intelligent algorithms. First, you must not send all values through forms. Where possible, URLs should contain variables and values necessary for successful form processing. It’s still possible to break through such actions anyway, but it’s much more complicated.
In order to increase your security level, you must ask your visitors for an answer which can be evaluated on the server side without sending the answer through the form. You can use ‘yes/no’ answers for instance. Simply ask your visitors which of two numbers is greater, send them through a form along wih the answer and compare numbers with the ‘yes/no’ answer. Unlike image verification, this is very useful technique since the answer isn’t included in any variable. Of course, you shouldn’t limit to ‘yes/no’ answer, and you can ask your visitors for any number of other checks.
Someone might say that’s enough, but spammers are prepared to go to great lengths. A very old but still useful algorithm for determining possible spam is the so-called word-check. It’s recommended that you store all messages left by both your visitors and spammers (naturally, showing safe messages only). Analysing messages which are marked as spam you’ll be able to determine new patterns. You can add these patterns into your algorithm, so that its ability to prevent spam will increase with every new spam message.
The most common patterns of spam messages are these:
1) Short messages
2) Messages with more than 2 outgoing links (be careful when applying this filter)
3) Messages which contain ‘banned’ words (it’s up to you to determine these words)
4) Messages which come from well-known spam sources (IPs)
5) Senseless messages which are characterised by set of characters which are ‘xdfa’ for instance. These characters pretend to be words, but they aren’t.
6) Sending too many requests from one IP address in a short time
7) Sending too many similar messages from different IP addresses in a short time
By applying all patterns you’ll manage to avoid spam messages. Moreover, if you add one or two questions, it will be almost impossible to penetrate such a defence. Programming these kinds of anti-spam walls doesn’t take a lot of programming resources or time. Depending on the web engine you are using it should take from several hours up to three days. Bear in mind that updating encrypted source code isn’t possible.
All in all, spam prevention updates can be easily implemented in any clearly-designed code. In my experience, it’s better to build a new portal than to continue using an existing bulletin boards system, because the anti-spam policy can be developed from the ground up while developing the engine itself.
Tags: application, FA cup, FA Cup Final, football, itv, itv.com, thrudigital, Twitter app

Where is the social media opportunity in your organisation?
...And Web Developers
WANTED: Web Designers
Technical Project Managers step forward.
How agencies help brands with "social"