A Data Engineer's guide to planning a Single Customer View project.
A Data Engineer's guide to planning a Single Customer View project.
Having a single view of your customer data – a single version of the truth – is widely understood as a prerequisite for true data-driven marketing. But if you’re considering building a Single Customer View, where should you start? How do you plan such an important, often sizeable, project?
At Optima Connect, connecting siloed data sources is our bread and butter. So as a Data Engineer with over a decade of experience – much of it building and maintaining Single Customer Views for our clients – here are the first 11 things I always ask when kicking off a new SCV project.
It might sound obvious, but the more systems you need to connect to create your Single Customer View, the more complex, time-consuming and potentially risky the project will be. So a good starting point is to figure out how many data sources you need to bring in. From there, you can find out exactly where all of that data lives.
You need to understand how your data is stored at source. For example, some of your data source systems may be operating on SQL Server, some may be flat files or Access databases. Understanding your various data technologies up-front will help you work out what types of data integration your Single Customer View solution will need to cater for. It will also help you assess whether you need to set up a new environment, or whether there are existing source systems that are capable of becoming your SCV.
The amount of data in each source system is often overlooked, but it’s important to consider the volume of data, at least at a high level. Do you have 10,000 customer records or 10,000,000? What about transactions? How much other data are you dealing with? Here it’s a good idea not just to consider the current volume of data, but also how your databases might grow in the mid- and long-term future. You probably don’t want to build pipelines into your SCV solution that work with your current data but fall over in a few months’ time when you have a spike in customers or transactions.
Nothing undermines the value of a Single Customer View like poor data quality. Looking at how data is being entered into each source system is a good place to start. Watch-outs that could signal you have data quality issues include data that has been manually-keyed (for example by call centre staff) and is therefore continually at risk of human error. Also find out what kind of validation or enhancement processes –if any – have been applied to the data in each system. Knowing these things will help you begin to quantify how much additional work may be required before you get to a complete, clean SCV.
A key task in building a Single Customer View is joining your data sets together. Known as ‘data matching’, this process relies on having a reliable means of identifying customer information as belonging to the same person. It also means removing duplicates (customers that appear multiple times in your database). This is easy if your customers have unique identifiers that are consistently applied in each of your data sources. If not, you’ll need to define a suitable matching process and agree your matching rules. In the majority of cases, working with a partner like Optima Connect will make building and tuning your matching process much faster and more cost-effective than building your own solution. Beware trying to use email addresses as unique identifiers!
It’s always a good idea to understand how the data will be used when designing a Single Customer View. Although the use cases will evolve over time as more stakeholders become aware of the excellent foundations you’ve laid by connecting your data sources, understanding how your stakeholders in analytics, data science, operations and marketing are planning to interrogate and use the data will help inform your solution design phase. Associated to this, it’s also worth understanding how quickly the business wants to start using the SCV – in many cases an iterative approach may deliver more value to the business, where you start with a few key data sources and iteratively build your SCV by adding more data sources in over time.
These days, IT teams’ default might be to go for a cloud-based solution, be it Microsoft Azure, Google Cloud Platform or Amazon Web Services. However, in some businesses there will be information security policies that require data to be stored on-premise. Additionally, depending on the volume of data you have, and what you need the SCV for, cloud-based storage may not be the most cost-effective solution for you. Always best to check up-front how your new Single Customer View fits with your wider cloud strategy, if you have one.
This overlaps with the Objectives point above, but experience tells us you should pay particular attention to the types of reports that will be required from the data. Do you know who will be creating reports from the Single Customer View? What do they want the reports to show? How often will those reports be required and when are they likely to want to create them? If you have a high demand for reporting in your organisation, you want to make sure to account for the additional computing power this will require, and that you structure your SCV to cope with competing demands (known as contention).
Again, it’s important to know who will be using the new Single Customer View that you have invested weeks or months into building. What kind of access to the data will each group need? Will they be querying the data themselves? Will they be building audiences? Will they be running machine learning or data science models? Often, if you have a large in-house analytics team, it will have a high demand for access to the data, and this may warrant a separate analytics environment to reduce contention and minimise the risk of your data scientists ‘breaking’ your SCV.
Privacy by design is not new. Under the GDPR, however, it is now a legal requirement. So when creating a new Single Customer View database it's important to plan ahead so that you can code in good data privacy practices. This means understanding your company's data protection policies in order to classify the data in your SCV, implement data retention rules and set up GDPR request procedures accordingly.
Data is not a static resource, and every SCV needs ongoing maintenance at various levels. At a server level, you need to agree who will be responsible for performing server maintenance, applying patches, updating the servers and so on. This is usually managed by IT departments. There are also tasks best suited to a Database Administrator, such as routine rebuilding of indexes or managing the back-up schedule. (Note that these forms of maintenance will not be required if you choose a serverless database solution.) There are also vital tasks that are best suited to data engineers, such as monitoring the data loads, matching and dedupe processes, and looking after errors and failures. Best practice dictates that you should review your matching process at least annually, and certainly whenever things change in the source data.
Of course, there are many other things to consider when planning a Single Customer View project. As a starting point, however, you could do a lot worse than getting answers to these 11 questions.
Our people don't know their stuff until we know your stuff too. We'd rather tease out the actual issue rather than offer fancy solutions to problems that don't exist. No hoo ha. No blah-blah.
Book a blah-blah free chat now