If We Can’t Get Socio-Demo Targeting Right, Forget AI and Machine Learning
Socio-demographic targeting should be a breeze for the digital ad industry by now — but it’s still way off base in many cases. Most large advertisers are using readily available socio-demo data, but the problem is that the backbone of that data is broken.
If we can’t solve something as basic as socio-demo targeting, we certainly won’t be able to take it to the next level of AI and Machine Learning!
Determining Socio-Demographic Accuracy
One key indicator of a successful digital campaign that aims to create brand awareness is On-Target Percentage (OTP), or the percentage of impressions successfully delivered to the right audience (based on gender and/or age). How do we determine whether a brand has achieved its OTP? Data partners have to model it due to lack of scale and often the quality of the base data is questionable. The lookalike models that might have been helpful for interest targeting have proven to not work for socio-demographic data.
The Problem with Socio-Demographic Data
There’s no stereotypical behavior online
Gender is difficult to predict based on browsing history even against “typical” male vs female sites. Someone who visits BabyCenter.com could easily be a mom or dad. Anecdotally, speaking with a well-known male-oriented publisher recently, they admitted that 30-40 percent of visits are coming from women.
Not only is gender difficult to predict, but age is even harder. After all, millennials and Baby Boomers alike read the same news sites and people of all ages visit e-commerce sites. We are more alike than different it seems
Devices are frequently used by more than one person
Has your child ever played on your smartphone? Does your husband sometimes borrow your laptop? Enough said.
Exaggerated stats abound
There are plenty of players who do not resist the temptation to “rip off” brands by implying they can deliver 15,000,000 tech enthusiasts, for example, when they don’t actually have that size audience. It happens all the time because there’s no business for them otherwise.
It’s time to build your own truth.
There is More than One Truth
Another industry-wide problem is that advertisers are content to use the same methodology for digital as they did with television — by comparing their data against traditional measurement panels and recalibrating accordingly. After all, advertisers and agencies understand those panels — they’ve been around for decades. And everyone is overwhelmed with all the different technology choices that might improve the situation— so they don’t look for a new direction.
Don’t get me wrong, those panels offer a great starting point for checking your data. However, when a brand is trying to target, say, 10 million users, the actual overlap with those panels may be too small. And, their results often vary.
Yes, some may use larger datasets to try and calibrate better – but those datasets do not become part of that panel and hence do not make it bigger. So, while these measurement companies remain an important part of the mix — there needs to be greater scale and more checks and balances by comparing different data sources including piecing together your own truth set.
Time To Geek Out: The Right Data Science
Next, if you want real accuracy, look to Bayesian statistical models to get multiple scores from different sources and assign one homogeneous score to define its confidence. Then it makes sense to develop a “quality priority matrix” at a data partner level, which prioritizes data sources solely on their quality or OTP score. Using such a method would allow advertisers to choose if they even want to have their data segments optimized against measurement panels or other truth sets.
Next, when you want to include new external data in this system and incorporate it into the priority matrix, there needs to be a different methodology to assign scores to it. The new attributes have to be compared against the already benchmarked ones. In this case, the Markov Chain Monte Carlo Model (MCMC) can be used to obtain information about the data quality distribution, before applying Bayesian statistical models to assess the new bigger data set again.
Let’s face it, if the industry can’t even get socio-demographic data right, we might as well forget about all the other fancy things that we imagine ourselves doing with A.I. and machine learning in the future. While data science will fail to impress if the base data is flawed, when you improve the input, the data science can be very powerful.