Data Collection for Machine Learning in SEM

Published by Patrick Mebus on

‚Data is the new gold’ is a common saying nowadays. There is so much data outside we’ve to deal with every single day. Therefore the crucial questions are: ‚what data is the most relevant for me and my campaign’? ‚What data is useless on the other side?’ and finally, ‚How can we ensure a proper and high quality data-collection?’.

There are several ways we can gain data in Search Engine Marketing
  • Website tracking and measuring onpage-actions
  • Search Engine measurements (e.g. Impressions and clicks)
  • Import CRM-data
  • Import 3rd Party data
  • Use public datasets

A smooth data collection: foundation of a proper Machine Learning model

Even, if this is just the first step, the collection of data is one of the most sensitive moments during the whole process. If we start digging in the wrong location, we’ll find everything but gold. Or worst: something that looks like gold but is something completely different. Working with wrong data in Machine Learning will lead to wrong assumptions. Nobody wants to make decisions based in crappy numbers and measurements. The impact of data-quality on the results is massive.

How to ensure a high quality data collection?
  1. Make sure your tracking pixels are impleneted properly and fire correctly when it comes to events and onsite-actions.
  2. To ensure a constant flow of data, avoid that your campaigns run out of budget and get disabled
  3. If you’re planning to bring some 3rd party tracking-data to your tracking-solution you should doublecheck, if the particular provider is compatible. Google for example has a lot of ad serving requirements and maintains a whitelist for external vendors
  4. Make sure that you’ve the right fliters enabled in your conversion-pixel setup. To capture just the relevant data you could exclude your own IP-adress, particular hostnames or irrelevant data sources
  5. Always have an eye on your companies or clients overall marketing activities. To detect and explain unexpected peaks and drops in your traffic this is just super useful and became a standard in my own approach for the last couple of years.
