How accurate is site visitor identification?
Results from benchmarking multiple de-anonymization services
We recently launched a benchmarking tool to compare the accuracy of website visitor de-anonymization services. In this post, we will share some preliminary benchmark results from the data we collected from folks who used the tool.
tldr: Accuracy rates for company identification remain low across services. Clearbit has higher identification and accuracy rates compared to others1 but is limited to company-level identification.
Methodology
We took a sample of of data from global visitors who used the tool. The FAQ on the whoami page explains how the tool works. But as a quick recap, the tool identifies a visitor with multiple de-anonymization services. An identification is considered accurate if the company identified by a service matches the work email that the visitor submits. This is not always perfect - some folks work at multiple companies - but is close enough.
There is another caveat. A service like RB2B only works in the US so their metrics might be diluted given the data considers global visitors. We felt global data was more representative given every other service works globally.
🥁🥁 The Results
The chart below show the two major metrics we used to measure the coverage and accuracy of de-anonymization services.
Total Identified: This simply measures the percentage of visitors who were identified by a service.
Accurately Identified: This is the percentage of visitors whose company was accurately identified by a service across *all* visitors (identified or unidentified).
We can however get more precise. What really matter when it comes to accuracy is the false positive rate i.e. how many “identified” visitors are accurate. Here’s the money chart:
As you can see, Clearbit and Snitcher do quite well in terms coverage but Clearbit is clearly superior in terms of precision along with Dealfront. Apollo’s and People Data Lab’s offerings are quite new so it doesn’t surprise us that they still have ways to go.
We will have more data to share in a follow-up post with more cuts to identify how they all stack up in terms of overlap coverage across different geographies e.g. US vs non-US. Please sign up here to get it.
🔑 Key Takeaways
Given the mixed results here, you might be tempted to give up on de-anonymization. After all <50% coverage and precision doesn’t look good! But there is another way to look at this. Every % point in coverage gives you visibility into your traffic that otherwise you wouldn’t have!
There are also techniques you can use to improve the precision as well get more signal out of this noisy data. Here are four concrete ideas we recommend:
1️⃣ Use confidence scores. Vendors like Clearbit and PDL will offer confidence scores (High/Medium/Low). We strongly recommend using these to filter out Low confidence visits.
2️⃣ Use multiple visitors as a signal. A single visitor is typically a weak signal by itself but with inaccuracy issues, it becomes even weaker. Multiple visitors from the same company in a short timeframe provide a stronger evidence, especially in the Work-from-home era (where they don't share an office IP).
3️⃣ Combine multiple providers. Using multiple data providers can not only increase coverage, but also accuracy when they agree on an identification. When providers that use different technologies e.g. for company vs person-level, match each other, it tends to be quite accurate. With a solution like Syft, you automatically get access to a world-class de-anonymization/enrichment waterfall that takes care of this for you.
4️⃣ Combine with first Party signals. If you identify a company on your website after emailing a prospect there, then you have a strong 1st party signal (the prospect's company and location) and a strong geographic signal (location of visitor). You can combine them to build a stronger confidence.
If you have any questions, please let us know in the comments below.
Which is why they are one of our partners!