Posts tagged tip

Should You Use Hotlink Protection?

This is a quick post about something that has troubled and intrigued me over the years, should you use hotlink protection?

What is a Hotlink?

First of all if you don’t know what im talking about, its pretty simple, when someone visits your site, their browser downloads all the pictures, text, code to make the page show up, but what happens if someone copies the url of one of your images and puts it on their page? then when someone goes to their page, it will show up your image on their site, so effectively you are giving away bandwidth to someone else.

Sounds bad, right? That is when hotlink protection comes up, there are different ways of doing it, but its basically just a couple of tweaks to the software/code/server to block anyone besides your site from using your content…

The Rise of Social and CDN’s

There was a time when bandwidth and hosting was expensive (like in Australia heheheh), where hosting a image or video was costly, but nowadays we live with a different set of rules, hosting and bandwidth is pretty cheap, not only that but a lot of the social websites depend and interconnect with the content on the web (facebook, twitter, pinterest, etc… are very popular because they provide a platform for sharing content, and sometimes that content is in your site).

I kinda think that whats important now is to have good content and popular content, so things like someone hot-linking are no longer a issue of cost, but one of opportunity, its also a type of problem that nowadays can be easily fixed with the use of CDN’s or cheap hosting.

Hotlink protection is also another form of a walled garden, and although suitable for some types of sites, most would gain more from being open and easily accessible, search engines will appreciate and users even more, so… should you use hotlink protection, im kinda inclined to say no! not anymore ^_^.

Why use a 3 Tier Backup System?

We all know backups are important, if shit can happen, shit will happen, they are important both online and offline, but if you are hosting a site or several sites, its important to have a backup strategy, that’s why S2R created the 3 Tier Backup System (sounds way more exciting than what it is):

1) Hosting Provider Backup System

  • Choose a hosting provider that makes backups, having RAID and high availability and failover hardware, that’s all cool for performance and redundancy, but backups should be expected and not on the same hardware as the server (offsite or high end backup software is a plus), having backups from your host eases most problems and makes most disasters easier to handle, as such this is the first backup tier.

2) Offsite Backup System

  • Have a cheap vps or backup account from another hosting provider (if he also provides backups that’s a plus), then use your hosting panel or rsync or whatever backup system you prefer to use to make and transfer backups to this box, we normally choose a weekly schedule for this (running on weekends when traffic is low), there is no need for daily copies, cause the goal is to have a weekly clean backup, of course we store 3 backups, so 3 weekly backups are always available on the server, the idea is to use this encase your hosting provider goes bankrupt or closes your account for some reason (nowadays its more likely than you think), and you get cut off from your first tier backups, so therefore this is the second tier backups.

3) Local Backup System

  • This one is also easy to understand its a local backup of the accounts, in my case its to my custom made backup system (2TB mirrored to another 2TB, way more than enough for my sites and personal files + Mozy Backup of all of this), similar to RAID, this is done also weekly (could be done only monthly), this is done for the simple reason of peace of mind and safety, i never needed it, but there is no such thing as too many backups and having one locally guarantees that whatever happens to your sites they will always be able to comeback from any disaster, so this is the final third backup tier.

With a three tier backup system, it might look a little paranoid, and you might take some time and money to build it, but now that its done, its easy to add new sites and the peace of mind it gives is priceless, and now i can eat right in front of the computer hahahah ^_^

Moving to Asynchronous Tracking

So in this move to… well move faster hehehe, im pushing Google Analytics Asynchronous Tracking on all S2R sites that use Google Analytics (almost all, the ones that don’t, they use Reinvigorate ^_^ ), although I’ve tested through this past week the speed changes between normal code and the asynchronous code, i can say i don’t see much of a speed difference, that could be, because well, my sites already run fast, but anyways when you join all of those little tweaks that speed sites up, it makes a big difference, especially if you get bursts of traffic ^_^

How to Protect your Sites

Well one of my sites was taken down for a couple of hours after it was completely screwed from a hack (well from script kiddies, but still), that deleted admin accounts and posts and added re-directs and other nasty stuff, cleaning it up would mean several hours and some things might be completely lost forever anyways, so what to do? before this happened, during or after to fix it, so what do i do to keep my sites online and protected, ill separate these into 3 major points:

Preventive Protection (before any problem)

  • Always have the latest updates to your online software, yes i know sometimes it brings new bugs, but most of the times its better to take the time to find workarounds and still update to the latest than opening yourself to an attack;
  • Always have multiple backups, all my hosts have backups but i also make my own to other servers (weekly) as well as a to my own computers (montly),  this ensures that even if there is a catastrophically bad failure (your host dies on you or deletes your account) that you are still able to bounce back pretty quickly;
  • Make sure your hosting is separate from your domains, since keeping those 2 together means if you need to jump to another host that you will always have problems (also have always a backup host that you like, and trust to jump to quickly if need be);
  • Use popular software, yes it might be a bigger target for hacks and security issues, but the chance of having updates and fixes is also much larger;
  • Resilient Hosting, doesn’t need to be cloud hosting or some strange arrangement, just needs to be from good hosting companies with good track records, they ensure that most hardware/server failures will never happen and if they did, that a fix would be done quickly and efficiently

Immediate Protection (when you first detect the problem)

  1. Put the site Offline, if you are on a apache server it normally means an update to the htaccess/htpasswd, you don’t want your users getting affected by your compromised site;
  2. Check to see how was the site compromised, was it the server, a bad admin, software flaw, try and find how did this happen;
  3. After you find out the flaw, search and see if there is a fix to it (server/software update), banning an admin, whatever it is, cause after you fix it, you need to make sure it doesn’t happen again.

Reactive Protection (how to fix the problem)

  • Best way is always, just delete the whole site and bring back the latest stable backup, sure you will lose some content or news but you have a guarantee that your site comes back crisp and clean, fixing it by hand means you can miss something and still keep your site compromised;
  • Make a test run and check if everything is alright, make sure to make the necessary adjustments before bringing the site back online;
  • Fix the security issue, if you found out what was the problem, go ahead and do the updates or workarounds, so this doesn’t happen again;
  • Make a brand new backup immediately before bringing the site back on, this ensures that if the site is still vulnerable, that you can bring it back up quickly, without much loss.

So that’s it, yes i know its basically using backups, and yes there are other ways, but this is the easiest more efficient way to protect your site from premature death ^_^

Alternative to All in One SEO Pack

Well we do have a lot of wordpress installs (not only for blogging, wordpress has come a long way and its almost at the point of becoming a full fledged cms like drupal or joomla), so having a good SEO pack to take care of the little tidbits of SEO is almost mandatory, so using the 2nd most popular wordpress plugin “All in One SEO Pack”, seems like a no-brainer, that is… it used to be, now more than ever its a nuisance, and moved from a very simple helpful plugin to a major bloated beast filled with plenty of idiotic choices, and i’ll gladly name a few:

1. Constant Updates, ohhh i’m all for updates, but come on, put it all in a bigger update, instead of pushing the equivalent of nightly builds, unless its a security risk (that i have my doubts, with this kind of plugin), it should be pushed with big updates, i guess they do this because of my second gripe…

2. When Updating they Deactivate the Script, yep, you activate on the plugins page, but you have to always go to the plugin itself to activate, thats just moronic and abusive, pushing ads/promotion/donations, jeez

3. Bloated or Useless Functions, stuff like pushing added keywords on single posts/pages, its not only pretty much useless these days, but its kinda promoting keyword stuffing, using excerpts as descriptions, talk about added bloat…

4. Adding Idiotic Stuff, to the code in the head like…

<– All in One SEO Pack 1.6.7 by Michael Torbert of Semper Fi Web Design –>

e in the head like… not only is it bloat to the code, but it also announces the freaking version, so if there is a security problem… yayyy

So as of now we are moving all of our wordpress installs from using the “All in One SEO Pack” to HeadSpace 2 and Platinum SEO Pack ^_^.oO( to see who performs better)

UPDATE: After a year i’ve done a more through Comparison of WordPress SEO Plugins heheheh and yes we do have a winner ^_^

How to Deal with Web Scraping

Hummmm since i have several galleries, one thing i encounter often is web scrapers, personally i don’t mind if anybody takes the entire gallery home or even if they re-post it somewhere else, that is fine with me, this is the web, if i wanted the gallery to be private i would made it so, if its public and free then go right ahead…



However the content itself is not the problem, the problem here is that the vast majority of web scrapers has bad default settings or the users that use them put too aggressive settings, its not uncommon for the load on the server to go from 0.10 to 1 in a heartbeat or even go down, i know its partly my fault i personally like to restrict as little as possible the server or the software (i could use several methods to restrict connections or to ban ip’s if there are too many connections), however because i don’t, sometimes i get into trouble, so this is what i normally do.

First of all i have this on the .htaccces (mod_rewrite), that helps blocking most of scrapping software (unless it spoofs as a browsers hehehe):

RewriteEngine OnRewriteCond %{HTTP_USER_AGENT} ^BlackWidow [OR]

RewriteCond %{HTTP_USER_AGENT} ^Bot\ mailto:craftbot@yahoo.com [OR]

RewriteCond %{HTTP_USER_AGENT} ^ChinaClaw [OR]

RewriteCond %{HTTP_USER_AGENT} ^Custo [OR]

RewriteCond %{HTTP_USER_AGENT} ^DISCo [OR]

RewriteCond %{HTTP_USER_AGENT} ^Download\ Demon [OR]

RewriteCond %{HTTP_USER_AGENT} ^eCatch [OR]

RewriteCond %{HTTP_USER_AGENT} ^EirGrabber [OR]

RewriteCond %{HTTP_USER_AGENT} ^EmailSiphon [OR]

RewriteCond %{HTTP_USER_AGENT} ^EmailWolf [OR]

RewriteCond %{HTTP_USER_AGENT} ^Express\ WebPictures [OR]

RewriteCond %{HTTP_USER_AGENT} ^ExtractorPro [OR]

RewriteCond %{HTTP_USER_AGENT} ^EyeNetIE [OR]

RewriteCond %{HTTP_USER_AGENT} ^FlashGet [OR]

RewriteCond %{HTTP_USER_AGENT} ^GetRight [OR]

RewriteCond %{HTTP_USER_AGENT} ^GetWeb! [OR]

RewriteCond %{HTTP_USER_AGENT} ^Go!Zilla [OR]

RewriteCond %{HTTP_USER_AGENT} ^Go-Ahead-Got-It [OR]

RewriteCond %{HTTP_USER_AGENT} ^GrabNet [OR]

RewriteCond %{HTTP_USER_AGENT} ^Grafula [OR]

RewriteCond %{HTTP_USER_AGENT} ^HMView [OR]

RewriteCond %{HTTP_USER_AGENT} HTTrack [NC,OR]

RewriteCond %{HTTP_USER_AGENT} ^Image\ Stripper [OR]

RewriteCond %{HTTP_USER_AGENT} ^Image\ Sucker [OR]

RewriteCond %{HTTP_USER_AGENT} Indy\ Library [NC,OR]

RewriteCond %{HTTP_USER_AGENT} ^InterGET [OR]

RewriteCond %{HTTP_USER_AGENT} ^Internet\ Ninja [OR]

RewriteCond %{HTTP_USER_AGENT} ^JetCar [OR]

RewriteCond %{HTTP_USER_AGENT} ^JOC\ Web\ Spider [OR]

RewriteCond %{HTTP_USER_AGENT} ^larbin [OR]

RewriteCond %{HTTP_USER_AGENT} ^LeechFTP [OR]

RewriteCond %{HTTP_USER_AGENT} ^Mass\ Downloader [OR]

RewriteCond %{HTTP_USER_AGENT} ^MIDown\ tool [OR]

RewriteCond %{HTTP_USER_AGENT} ^Mister\ PiX [OR]

RewriteCond %{HTTP_USER_AGENT} ^Navroad [OR]

RewriteCond %{HTTP_USER_AGENT} ^NearSite [OR]

RewriteCond %{HTTP_USER_AGENT} ^NetAnts [OR]

RewriteCond %{HTTP_USER_AGENT} ^NetSpider [OR]

RewriteCond %{HTTP_USER_AGENT} ^Net\ Vampire [OR]

RewriteCond %{HTTP_USER_AGENT} ^NetZIP [OR]

RewriteCond %{HTTP_USER_AGENT} ^Octopus [OR]

RewriteCond %{HTTP_USER_AGENT} ^Offline\ Explorer [OR]

RewriteCond %{HTTP_USER_AGENT} ^Offline\ Navigator [OR]

RewriteCond %{HTTP_USER_AGENT} ^PageGrabber [OR]

RewriteCond %{HTTP_USER_AGENT} ^Papa\ Foto [OR]

RewriteCond %{HTTP_USER_AGENT} ^pavuk [OR]

RewriteCond %{HTTP_USER_AGENT} ^pcBrowser [OR]

RewriteCond %{HTTP_USER_AGENT} ^RealDownload [OR]

RewriteCond %{HTTP_USER_AGENT} ^ReGet [OR]

RewriteCond %{HTTP_USER_AGENT} ^SiteSnagger [OR]

RewriteCond %{HTTP_USER_AGENT} ^SmartDownload [OR]

RewriteCond %{HTTP_USER_AGENT} ^SuperBot [OR]

RewriteCond %{HTTP_USER_AGENT} ^SuperHTTP [OR]

RewriteCond %{HTTP_USER_AGENT} ^Surfbot [OR]

RewriteCond %{HTTP_USER_AGENT} ^tAkeOut [OR]

RewriteCond %{HTTP_USER_AGENT} ^Teleport\ Pro [OR]

RewriteCond %{HTTP_USER_AGENT} ^VoidEYE [OR]

RewriteCond %{HTTP_USER_AGENT} ^Web\ Image\ Collector [OR]

RewriteCond %{HTTP_USER_AGENT} ^Web\ Sucker [OR]

RewriteCond %{HTTP_USER_AGENT} ^WebAuto [OR]

RewriteCond %{HTTP_USER_AGENT} ^WebCopier [OR]

RewriteCond %{HTTP_USER_AGENT} ^WebFetch [OR]

RewriteCond %{HTTP_USER_AGENT} ^WebGo\ IS [OR]

RewriteCond %{HTTP_USER_AGENT} ^WebLeacher [OR]

RewriteCond %{HTTP_USER_AGENT} ^WebReaper [OR]

RewriteCond %{HTTP_USER_AGENT} ^WebSauger [OR]

RewriteCond %{HTTP_USER_AGENT} ^Website\ eXtractor [OR]

RewriteCond %{HTTP_USER_AGENT} ^Website\ Quester [OR]

RewriteCond %{HTTP_USER_AGENT} ^WebStripper [OR]

RewriteCond %{HTTP_USER_AGENT} ^WebWhacker [OR]

RewriteCond %{HTTP_USER_AGENT} ^WebZIP [OR]

RewriteCond %{HTTP_USER_AGENT} ^Widow [OR]

RewriteCond %{HTTP_USER_AGENT} ^WWWOFFLE [OR]

RewriteCond %{HTTP_USER_AGENT} ^Xaldon\ WebSpider [OR]

RewriteCond %{HTTP_USER_AGENT} ^Zeus

RewriteRule ^.* – [F,L]

I monitor the load of the server if the load as a spike for more than a couple of minutes i check the apache log, if there are lots of connections to the same site from the same ip, i ban this ip with .htaccess, adding this line:

Order allow,deny
Deny from 100.100.100.100
Allow from all

(100.100.100.100 is the ip on the logs) and check the load after a couple of minutes, if its down fine, if they jumped ip’s, i’ll do one of two things, if they keep on the same ip range i’ll just block that ip range, like so:

Order allow,deny
Deny from 100.100.100.
Allow from all

If they aren’t then i limit the connections to 10 on the cache or image directory (not on all the site), i know it will hurt all users but its better than nothing, just adding this line:

MaxClients 10

If it still persists, i’ll just close the image directory,adding this line again on the cache or image directory (depending on the type of image gallery software you are using):

deny from all

So the site stays up as well as the thumbnails, just the images won’t be accessible for a while, all of these are temporary measures, but for now, for me they do the trick, most of the times banning the ip is enough of a cure, and those i always leave on the .htaccess the other options i normally remove them the next day after the connections storm has passed, bottom line is if you want to scrape instead of bombing the server for an hour, make it so it downloads slowly during a couple of hours it makes a big diference and everyone gets what they want.

How to Change Hosts Files

The hosts file is a computer file used to store information on where to find a node on a computer network. This file maps hostnames to IP addresses (like 10.0.0.1 points to www.google.com). The hosts file is used as a supplement to (or a replacement of) the Domain Name System (DNS) on networks of varying sizes. Unlike DNS, the hosts file is under the control of the local computer’s administrator (as in you). The hosts file has no extension and can be edited using most text editors.

The hosts file is loaded into memory (cache) at startup, then Windows checks the Hosts file before it queries any DNS servers, which enables it to override addresses in the DNS. This prevents access to the listed sites by redirecting any connection attempts back to the local (your) machine (ip address 127.0.0.1). Another feature of the HOSTS file is its ability to block other applications from connecting to the Internet, providing the entry exists.

So you can use a HOSTS file to block ads, banners, 3rd party Cookies, 3rd party page counters, web bugs, and even most hijackers, so here are some instructions to do so and some sites with already made hosts files (you just overwrite your own hosts file):

The host files location is:
Windows XP at c:\Windows\System32\Drivers\etc
Windows Vista at c:\Windows\System32\Drivers\etc
Windows 2000 at c:\Winnt\System32\Drivers\etc

There you will find a file named hosts (no extension), like we said above you can edit it with any text editor, and function is simple, you map ip addresses to hostnames, so the files will be mostly like this…

127.0.0.1    localhost
127.0.0.1    www.bad-spyware-site.com
127.0.0.1    www.site-with-virus.com
127.0.0.1    www.publicity-ads-site.com

if you want to add any domain, just add a new line right 127.0.0.1 for the localhost (this way when that domain comes up in the browser the browser will search for it on your computer and not online, because the hosts file told him that), so for example:

127.0.0.1    localhost
127.0.0.1    www.bad-spyware-site.com
127.0.0.1    www.site-with-virus.com
127.0.0.1    www.publicity-ads-site.com
127.0.0.1    google.com

so now if i put google.com on the address bar of the browser it will give me a blank page and google.com wont work anymore, if you want to delete a entry, just delete the line or put a # in front

127.0.0.1    localhost
127.0.0.1    www.bad-spyware-site.com
127.0.0.1    www.site-with-virus.com
127.0.0.1    www.publicity-ads-site.com
#127.0.0.1    google.com (google.com will work now)

so the idea is to use the hosts file to block unwanted or bad sites ^-^ clean and easy hehehe

Here are some sites that provide awesome host files ^_^ .oO (choose one of them)

Hostman : Its an automated hosts file updating software
Host File : Pretty cool and clean hosts file
Someone Who Cares : A compreensive hosts file
MVPS : A hosts towards blocking unwanted stuff