Trestle’s Data Coverage and Accuracy
One of the first questions we get from prospects and customers alike is, “What is coverage and accuracy, and how does it compare to vendor X or vendor Y?” As I mentioned in one of the previous posts on data sourcing, this is one of the most important aspects of choosing a data provider, so these questions are expected of us as well.
Previously, when I have sourced data and asked these questions, typical answers I receive from other vendors are, “our coverage is great,” “accuracy is above par,” etc. Basically, they’re templated answers that are not quantifiable, thus not providing any confidence as a buyer.
At Trestle, we want to be as transparent as possible about these numbers because we believe our customers should know that information from any vendor, as it expedites a lot of the decision-making process on both sides. So here we go:
Name Coverage (Percentage of times we will have a name for a phone number queried): ~90%
Address Coverage (Percentage of times we will have an address for a phone number queried): ~87%
Accuracy (Trestle has the right name associated with the phone number): ~93%
Head to head going against any other identity data provider, we will come out on top with these numbers. We are extremely confident about it, but of course, we want our customers to test and confirm as well. So how did we achieve this level of breakthrough coverage while keeping a high accuracy bar?
It starts with redundancy and having multiple data providers for each data attribute we serve. This results in our data COGS being quite high, but we believe if we have to serve the enterprise customers and their mission-critical loads, we need to have multiple data providers to add redundancy, improve coverage, and choose the best provider for each attribute in our response set.
These data providers are merged together with sophisticated waterfalls. Knowing which data provider to use for an attribute and in what order is our secret sauce. It involves a lot of analytics and data science coming together to determine the weight of every data provider for every attribute to optimize on both axes – coverage and accuracy.
These huge data sets and waterfalls only become a reality because of a strong data team we have built. Our analytics team has come up with the right hypotheses, and our data science team tests it at scale and deploys it in our data ingestion infrastructure. Our experience and the learnings we had in the last decade doing this work have trained us on the personnel and skillset we need to be able to assess the datasets, both quantitatively and qualitatively and then being able to maximize the value.
All of this comes to life with a scalable data ingestion and delivery technology stack and infrastructure. We talked about our delivery architecture before, but in addition to that, we have invested heavily in our ingestion and merge pipelines to ensure we can deliver accurate data with a high signal-to-noise ratio at scale. This becomes an extremely complex problem when there is no single unique identifier like SSN to merge records, a lot of which we will cover in a subsequent post.
As important as having comprehensive datasets, a robust truth set matters just as much. Too often, we see customers saying they tried a couple of phone or people searches, and the specific data attribute we returned is incorrect. While we are always striving to improve in that aspect, we encourage our users to test with a broader data sample. The problem is while coverage is straightforward to measure, our customers do not have a big enough truth set beyond their immediate family and friends. This is where we invest quite a bit, and ultimately, it has become a differentiator for us. Building PII truth sets is a massive undertaking that requires hundreds of thousands of call downs and lookups in public sources across various states and counties. Even before finding the data sources, we, in fact, started with building these truth sets. It then becomes so much simpler to quickly evaluate new providers and see how much value they add overall. It is one of the reasons for our speed and agility to be able to come to the market with such a strong offering right out of the gate.
Embracing the grind, we are improving coverage and accuracy daily. After the initial 1-2 providers were added, it was all about that “1% better everyday” motto. There is no magic formula, and we’re constantly exploring and testing to keep pushing that delicate needle of coverage and accuracy up and to the right.
Does it mean we are done here? Absolutely not. Two places where we are focusing on the most right now:
On coverage, we’re focusing on new attributes that add value for the call tracking and marketing use cases. The goal is to add attributes that move the needle in terms of figuring out the propensity to buy and helping businesses to improve their conversions.
On accuracy, we are looking to improve our signal-to-noise ratio further. For example, if we have multiple names associated with a phone number, which is the most relevant name or business that we should surface at the top? We have identified areas of improvement there already, so we are in execution mode. However, it is also something we are working on with our customers because we know that, at times, the correct answer depends on the use case. For example, if we have a business LLC and an individual, both associated with a phone, some of our customers might just want the individual name, while others do not really care what order we return both those names. Hence, the ability to provide the right knobs that the customer has at their disposal is another way we can help some of our sophisticated customers on the accuracy front.
We hope this helps and gives a sense of where we are on the coverage and accuracy curve and how the team is focusing on this “1% better everyday” journey.