Exceptional data engineers are a rare breed. In this article, we take a look at what separates the best from the rest.
Exceptional data engineers are a rare breed. In this article, we take a look at what separates the best from the rest.
Data engineering is like most things. It’s easy to pick up the basics. But it takes a long time – and, more importantly, the right mindset – to master.
Over my 20+ year career in data engineering, which includes over a decade of contracting, I’ve worked with a few shockingly bad data engineers, a lot of good ones, and a relatively small number that I would call truly exceptional.
And it’s this latter category – the truly exceptional data engineers – that I’m most interested in.
What is it about the best data engineers that sets them apart? Do they have certain traits and habits that separate them from those who are simply ‘good’?
The short answer is yes.
Time and time again, I’ve observed a consistent set of qualities that the best data engineers have that allows them to perform to the highest standards, day-in, day-out, no matter what project they’re working on, or what team they’re a part of.
Let me start by saying that you won’t find experience on my list. Experience helps, of course, but it’s far from a quality guarantee. I’ve seen data engineers that have a great CV, talk a good game, and who work at lightning speed. To the untrained eye, they may even appear to do a good job. In actual fact, they can leave projects riddled with technical debt, bugs and catastrophes waiting to happen. Meanwhile, I’ve seen junior data engineers that have all of the raw ingredients and the right attitude. Give me a nascent data engineer with the right qualities any day. They’ll overtake their more experienced counterparts, and they’ll do it surprisingly quickly.
So, let’s get into it.
All good data engineers have a solid grasp of best practice, including in-depth knowledge of database design and how to get the best performance from different types of database.
To be a great data engineer, it’s about more than just knowing best practices – it’s about applying best practices, day-in, day-out.
Consistency is vital. It’s easy to be perfect on a good day. But every day, for your entire career? That requires discipline and for good practices to be so deeply internalised that they become second nature.
The very best data engineers also inherently get the ‘repeatable automated’ aspect of solutions. The best data solutions work without manual intervention. This means designing for failure. When things fail, the system has to be in a place where it’s easy to pick up the pieces.
The best data engineers understand that even their best solutions can fail. What’s critical is that they don’t allow their solutions to fail in an intermediate state. The processes in their solutions either succeed or roll back to a consistent state.
Working with data in business is never straightforward, particularly at scale.
And data engineering is like any other form of engineering – there are nearly always multiple ways to solve any problem. Solutions can be lean and elegant, or clunky and unnecessarily complex.
The best data engineers all have the ability to view a problem from multiple angles, think creatively about different ways to solve it, and assess the relative merits of each solution before choosing the best one. This requires a combination of mental agility and experience. Experience will, of course, be gained over time. The ability to think laterally, however, is the more important determining factor as to whether someone will ever become a world-class data engineer.
That’s why our interview process for graduate data engineers includes a thinking styles test. Our test is designed to assess lateral thinking ability. You can't prepare for it.
Even to reach the test, applicants need a good degree. To get the job, you also have to be a natural problem-solver. Sadly, only around one-in-ten applicants passes our test. On the one hand, this can make it challenging for us to recruit new people. But we don’t cut corners. We know that by only hiring people that can approach new problems with both creativity and logic, that every single one of our data engineers has the potential to become one of the best in the industry.
Excellent data engineers plan ahead and always think about scale. Whether they’re designing solutions or writing queries, they think about the nature and size of the data they’re working with. Not just how the data exists today, but also how that data may grow over time.
That means understanding how their solution will be used, and anticipating problems that might occur. Predicting how solutions might break leads to a robust, considered construction in the first place.
Take the example of building a data pipeline. A good data engineer can build a good pipeline. It’ll work. But what about when someone goes to investigate it in a year’s time… Have they properly considered archiving? Can the pipeline handle 10x or 100x the original data volume? When they’re adding a new data feed, the best data engineers are also thinking about the volume it might gather over time, the sensitivity of the data… the retention. How does this solution scale for 100, 1000, 100k, 100m records. The best engineers will anticipate all of these factors and design their solution appropriately.
Anticipating future problems is one trait of high-performing data engineers. But the best data engineers also understand when a future problem is likely enough to cause a problem to be worth pre-emptively coding for.
Good data engineers that become obsessed with future problems become defensive coders. Trying to code for every conceivable problem before it arises will often lead to over-engineered solutions. In most cases, you don’t want your data engineers to spend 80% of their time coding for problems that are extremely unlikely to occur.
The best data engineers take a more balanced, pragmatic approach. They understand what can go wrong and will also be selective about which to be defensive about. This approach results in solutions that are built quickly, elegantly, and yet still provide sufficient defences to a range of the most common issues.
Excellent data engineers take pride in their work. Everyone wants to build a good solution, and in the best data engineering teams you’ll find a collaborative environment with a healthy dose of friendly competition. Central to this is having a growth mindset.
A growth mindset means knowing that there may be other ways to solve a problem that you may not have considered, and allows you to be comfortable sharing ideas with your peers. It means being happy to take feedback on board – seeing it as a learning opportunity, not as criticism.
It means keeping up-to-date with the latest data technologies. In an industry where tools and technologies are always changing, it’s critical that data engineers don’t make assumptions or simply rely on yesterday’s technology.
And it means not making the same mistakes twice. If you want to be an exceptional data engineer, you can’t keep using the same solution. You need to re-evaluate. You need to figure out whether there’s a new technology, some new technique of doing things. Funnily enough, this is one of the main reasons I stopped contracting – I saw the same patterns, the same corners being cut. And while it's easy to defend this approach by labelling it 'efficient', in many cases, it led to data engineers becoming complacent and simply repeating the same mistakes.
It takes each individual to foster their own growth mindset. But much of it comes down to the culture you create for your engineers. As a leader, you have to build a good team environment. You need people that are collaborative and competitive in the right balance.
--
As you might have guessed, this is a topic that’s close to my heart. Not just from a personal development perspective. I'm part of the senior management team at Optima Connect, and data engineering is the biggest department we have. From our newest graduate to our most experienced technical architect, we work hard to be the best in the industry.
Spotting and nurturing the five qualities I’ve listed here is one of the ways we do that. But I’d love to hear what other data engineers would include on the list.
Our people don't know their stuff until we know your stuff too. We'd rather tease out the actual issue rather than offer fancy solutions to problems that don't exist. No hoo ha. No blah-blah.
Book a blah-blah free chat now