FYI: This is one of the posts that I have imported from my old blog. Even though this tutorial has screenshots from the old post, it still works!
As a blogger, I’m sure you’ve been tremendously frustrated with referral spam showing up in your Google Analytics reports – it’s a real pain in the ass! I’ve been researching this topic for a while now and have been able to successfully remove it all from my reports – Yay!
I just started this blog around November 1st – it is not my first blog so I did have some experience dealing with referral spam. I’ll try to break it down as simple as I can for you.
Here’s a screenshot of what my Google Analytics looked like before I got rid of the spam…like I said, I just started this blog about 4 weeks ago and there’s no way that I had 294 visitors in a month! I didn’t even start promoting my blog until just recently. The bulk of these “visitors” are spam referrals, spam crawlers and visits from me during the design phase of the blog (you should also exclude your own IP address).
Let’s Talk About Spam
There seems to be a lot of confusion within the blogging community on handle referral spam – most bloggers are not aware of the fact that there are actually two types of spam in Google Analytics: Ghosts and Crawlers.
The bulk of the spam that is affecting your reports is called “Ghost spam” and luckily it’s actually quite easy to get rid of it – this one hardly requires any maintenance at all.
Ghost spam doesn’t actually even access your site (which is why it’s actually easier to control than crawlers). Basically spammers randomly send “fake data” to Google Analytics servers which may or may not record a “visit” on your site.
Contrary to popular belief, you CAN NOT BLOCK ghost spam from your .htaccess file. The reason for this? The purpose of your .htaccess file is to allow or block access to your site. As I mentioned earlier, ghost spam doesn’t actually access your site so including them in your .htaccess file is pointless.
To see how much ghost spam you’ve had recorded on your site, go to Audience => Technology => Network. Next click on “Hostname”. You can see below that other than my domain name, there is a lot of junk there! Those are all ghost visits!
Unlike ghost spam, crawler spam actually does access your site by crawling your pages. When this happens, it records a visit on your reports. This type of spam requires a bit more maintenance because new ones always seem to pop up so you have to keep checking your reports for new ones and adding them to your filters.
To check your crawler spam visits, go to Aquisition => All Traffic. Next click on “Referrals”.
Here you see the list of my crawler spam:
Common Misconceptions About Referral Spam
- Bounce rates affect search rankings: while it’s true that referral spam brings up your bounce rate, this DOES NOT affect your search rankings.
- Use the referral exclusion filter to stop spam: I’ve seen so many bloggers recommend to others to use the “referral exclusion” filter, this is not the correct way. It may be confusing (afterall we want to exclude referrals from our reports), this filer actually has other purposes.
My Process to Remove Google Referral Spam
As I mentioned earlier, this is an effective solution for managing referral spam but this tutorial only covers a solution for simple blogs. It consists of four steps:
- Step #1: Eliminate Ghost Spam by implementing a hostname filter.
- Step #2: Eliminate Crawler Spam by using a filter.
- Step #3: Create a custom segment with both these filters so you can view your reports without spam.
- Step #4: Activate Google Bot + Spider.
Let’s Get to Work!
Step #1: Eliminate Ghost Spam by implementing a hostname filter
As I mentioned earlier, Ghost spam doesn’t actually visit your site but they can post fake hits to your site which it really annoying. Because they don’t visit, you can’t block them via your .htaccess file (can’t block something that isn’t even there right?).
In order to get rid of Ghost spam, you may think that creating a filter to exclude them would work right? WRONG! Ghost names use fake hostnames so we can eliminate all of them by creating a filter that includes the only valid hostname (your server).
Here’s how I can explain this in layman’s terms: let’s say you don’t get along with your mother-in-law…she keeps calling you, taking cheap shots at you…You decide that enough is enough – you’re having a party because you’re sick of all the bullshit but you only want your friends there. You only send invitations to your buddies and don’t invite your mother-in-law…see where I’m going with this?
Using the method above, you’re only including the traffic that you want (not the yucky, horrible traffic).
- Once you’re logged into Google Analytics and click Admin at the top of your screen (if you have multiple websites, make sure you have the right one).
- On the left, click “Filters”.
- Click on “Add Filter”.
- Enter a filter name such as “Valid Hostname”. Click “Custom”, then “Include”. From the drop-down menu, choose “Hostname”. Under “Filter Pattern”, enter your domain name. Click “Save” at the bottom.
As I mentioned earlier in the post, this will work if you have a blog and you don’t have your Google Analytics code integrated with a 3rd party company such as an ecommerce platform or Youtube (those are just examples). If you have your GA integrated on other sites, you will need to add that site to your list of “Included Hostnames” (not covered by this tutorial).
I have a simple blog and this method worked perfect for me – I even use Woocommerce as an ecommerce platform but the reason it works is because I know for a fact that I didn’t integrate my GA code on any other site but my own.
Step #2: Eliminate Crawler Spam by using a filter
Ok so next on the list is Crawler spam – because the “Hostname” filter that we created in the previous step only cover Ghost spam, we need a different approach for this type of spam. Most of this spam can be eliminated by creating a filter that excludes the “campaign source”. Most bloggers create a filter that excludes the referral but that isn’t the right way to handle this situation.
- To view the spammers you need to exclude, go to Acquisition => Overview and click on “Referral Spam”.
- Next you’ll see the list of all the spam websites that you want to exclude. You’ll have to sift through and determine which are good and which are bad. In my list, I see bloglovin.com and in.search.yahoo.com – I don’t want to exclude these ones. It’s pretty easy to determine which ones are garbage. If you are in doubt, just do a simple search on Google.
- Next step is to create the exclusion filter. Go to Admin => Filters => Add Filter. Name your filter whatever you want. Click “Custom”, then “Exclude”. From the drop-down menu, choose “Campaign Source”. Next add your filter pattern – this is basically the list of the garbage domains that you want to exclude. Just include the main domain (no sub-domains) and separate each with a vertical bar. Do not end the expression with a vertical bar. Also there is a character limit so if you have a lot to exclude, you may have to create an extra filter.
So in my case my list looks like:
- Click Save! So from this point forward, if you see any new ones popping up in your Analytics report, you will have to add these to your existing filter (or create a new one if needed).
Step #3: Create a Custom Segment
Creating a custom segment is important because it instantly removes spam from your reports and even gives you access to historical data without the spam.
- Click on the “Reporting” tab at the top and then click on “Add Segment”.
- Click New Segment
- In the top left, name the segment whatever you’d like with “No Spam” in brackets. Under “Advanced”, click “Conditions”. This is where we will put the 2 “rules” we created – the first to include your hostname and the second to exclude the spam domains.
- Sessions -> Include
- Hostname -> matches regex -> choose your valid hostname (that you created in Step #1).
Click “Add Filter” to add another filter.
- Sessions -> Exclude
- Source -> matches regex -> choose your spam crawler expression (that you created in Step #2).
NOTE: if you created more than one spam crawler expression because of lack of room, you will need to add an additional filter to add the second expression by clicking “OR”. This will automatically open up a new filter for your second crawler expression:
Step #4: Activate Google Bot + Spider Option
- Google Analytics has a built-in option to exclude bots and spiders. Click Admin => View => View Settings.
- Check off “Exclude all hits from known bots and spiders” and click Save.
See it in action!
Want to see your reports without all the garbage spam? Click on “Reporting”. You should see the name of the custom segment you created.
Don’t See Your Custom Segment?
If you don’t see your custom segment, don’t panic! Simply click “New Segment” and from the Segment Name List, you should see the one you created. Check it off and click “Apply”. If you want to compare the old stats to the new, keep “All Sessions” checked off…this is when you can really see the difference.
Here’s a comparison of my stats with and without the spam…the blue line represents the spam stats and the yellow line represents my stats without spam:
So that’s it! I don’t know about you but when I figured this out, I felt like celebrating! It’s so awesome to now get reports without all the crap that was there before! I hope you found this tutorial helpful, and if you did please share! Let me know how you made out in a comment below!