Data Engineering vs. BI: Drawing the Line Before It Breaks

A detailed look at Enterprise Data Teams and a Framework to Succeed

The Data Team Structure

A brilliant former boss and current data mentor once told me that every company’s data environment is a little different. The team structure, roles, tech stacks, and terminology are all a normal part of adapting to a new company as a data professional.

However, one pattern that remains consistent in large enterprises has been the blurry line between the responsibilities of the data engineering team vs. the BI & Data Visualization team. If not defined and managed carefully, this can create a lot of political tension between these teams, causing miscommunication and failed deliverables.

Based on my experience, smaller companies don’t have this problem. Data engineers are often rolled up into a BI & Analytics department. The all hands on deck approach of a start-up or small business naturally removes this divide.

With that said, below I highlight what an enterprise data team structure should theoretically look like.

Product Team

Product Manager:

For a given business unit, product managers are the captains of data projects. Although I’ve never held a formal role in product management within a company, I’ve spent enough time with them to understand what they do.

They guide the data project’s strategy, working with stakeholders to define the key objectives mapped to business outcomes. They then take those business requirements and envision a product roadmap. This roadmap serves to tie the data product to the ROI it has on a business’s bottom line.

After the high level work concludes, they’re responsible for delegating the granular project task details, organization and planning to a project manager.

Project Manager / SCRUM Master

The project manager gets tactical with the product strategy roadmap. They build out the granular tasks for developers, architects, and UX designers to work on. Typically, they’re experts in project management tools like Atlassian/Jira. Project managers are also in charge of hosting daily stand-up meetings, grooming/sizing sessions, and sprint recaps.

Data Engineering Team

The data engineering team handles all of the movement of data from source to destination. This process is known as ETL or ELT (aka Extract, Transform, Load or Extract, Load, Transform). Typically they’re made up of Data Engineering Managers, Lead Data Engineers and Sr. and Jr. Engineers.

Data engineers work through source software integration code so they can connect and extract the data needed from them. Truthfully, I’ve always thought about data engineers as the software engineers of the data world.

Additionally, not only are they in charge of extracting and loading the data, but also transforming it (in an ideal world!). Data engineers truly are the engine behind the data models that businesses ultimately consume in a BI tool. They build these data models in database schemas using tools like SQL, Python, or dbt.

The skills needed to be a data engineer include familiarity with the SDLC, translating technical concepts and jargon so the business can understand it, Python, SQL, and a data integration tool like Airflow or Azure Data Factory. DBT is also on this list, but controversially belongs to the analytics engineers who are part of the data engineering team in many cases.

Business Intelligence & Data Visualization Team

According to IBM.com, business intelligence is a set of technological processes for collecting, managing and analyzing organizational data to yield insights that inform business strategies and operations. Typically, BI teams are made up of BI Managers, BI Analysts and BI/Analytics Engineers.

Overall, I agree with this definition. It sticks to a very descriptive framework, and that’s exactly what BI should do. Especially at enterprise companies, where business intelligence is typically different from the Data Science & Advanced Analytics team.

BI teams are very good at explaining the who, what and when of a company’s data. They explain this by creating easy to digest reports and dashboards leveraging BI tools like Looker or Power BI. Furthermore, it’s not uncommon to leverage Excel or Google Sheets as a BI professional when doing data validation.

It’s important to note that BI teams work with the data curated by data engineers. A data engineer and BI engineer or BI analyst work extremely close together. This is where some muddy waters can originate, but I’ll get to that in a bit.

Data Science & Advanced Analytics Team

Moving away from the descriptive side of data, data scientists, data analysts and ML engineers implement predictive and prescriptive solutions on a company’s data. They answer what will happen and what a company should do about it to enact changes that drive profit.

They might build out customer sentiment models to understand how a company is perceived by a target persona. Furthermore, they run churn propensity analyses to predict which customers might churn in a given time period and why. Or they create recommendation engines that help upsell customers who have already bought certain products.

Now that we’ve covered all parts of a data team, it’s time to get into the blurry line between the data engineering and business intelligence teams. I’ll be pulling directly from my own experiences at several F500s, so here we go.

The Data Engineering and BI Divide

In the last decade of BI, the rise of the semantic layer and advancements in modern tools has enabled BI professionals to do a large amount of custom logic and code. Instead of accessing the data warehouse to conduct transforms and data modeling, BI teams can stand up custom transformations and data models through the semantic layer right in the BI tool itself. All they have to do is establish a connection between the data warehouse and BI tool.

This has brought pros and cons in terms of capabilities and team dynamics. Let’s start by listing off a few of the pros:

Upskilled BI analysts and engineers in SQL and data modeling

I truly think that analysts went from analysts to engineers (rise of the analytics engineer) with the flexibility a tool like Looker gave its developers starting in the 2010s. A precursor to dbt, LookML was the first of its kind. Being able to build advanced SQL transformations on the fly to create reports (without touching the warehouse UI) motivated BI professionals to dive deep into SQL. They didn’t have to lean on data engineers for every transformation, data model and schema creation anymore.

This was the moment a lot of technical upskilling happened. Instead of having some basic aggregations like Tableau before 2020, BI developers could build robust logic and data models to serve their stakeholders.

Added flexibility to prototype advanced data solutions in the BI environment

When a modern BI tool’s capabilities are used correctly, the advanced logic and data models built should exist as prototypes within the BI tool. This is great for getting quick wins and proving whether or not a reporting idea is possible. For any production grade reporting system, BI developers will coordinate with the data engineers to move that logic into the data warehouse layer.

As you can see, these are two huge wins. They allow BI teams to move and deliver faster. It removes their dependency on data engineers, which frees up everyone’s time to focus on value creation. However, that added flexibility and speed has created many issues. Now let’s talk about the cons…

Lack of definition about ownership

Since modern BI professionals are very savvy in SQL and data modeling (not just data visualization), the tendency for data engineers to shift the burden of greenfield data modeling onto BI analysts and engineers is a real thing. I’ve spent years working on BI teams for/with F500s and this point holds true.

Often, as the BI person you’re working closest to the business. This proximity makes it easy for data engineers to place the expectation on you to gather requirements for a source system’s data model and the transformations needed to build a requested report.

While they sit in the background and manage the orchestration of Airflow jobs and moving the data, they place the burden on the BI person to build the advanced SQL that will produce accurate numbers. This is what creates the tension, and reveals if a data engineer is truly practicing data engineering, or hiding behind the political landscape to avoid heavier lifting.

Wrong numbers? Queue the blame game

The autonomy given in BI tools always creates friction between the data engineers and BI people when numbers are wrong. Is the semantic model in the BI tool set up incorrectly, or is there an error in the backend transformation of the data warehouse table?

When this issue happens, it’s often a back and forth game between teams. Instead of working together, it becomes a blame game of who is right and wrong.

The Data Team Ownership Framework

A way to mitigate this back and forth is to deploy a documented framework in your organization. Who owns what? And how do teams work together? The below framework helps to do just that.

Data Infrastructure and Raw Data Ingestion (Owned by Data Engineering)
1. Set up and manage ETL/ELT pipelines
2. Maintain data warehouse
3. Load raw and staged data from source systems
4. Security and Access Control management
5. Ensure fault-tolerant data pipelines, reliability and monitoring
Business Logic Layer / Data Modeling (Joint Ownership - led by Data Engineering)
1. Build and maintain core data models (using dbt or SQL)
2. Transformations that align to the business logic
3. Curated tables ready to be ingested by the BI tool

This second step is clearly where the waters are muddiest. However, I envision Standard Operating Procedures for this part of development as a series of working sessions with BI team members and data engineers.

The physical coding is primarily done by the data engineer on these calls. BI team members may bring some pseudo code or physical code to kickstart things, but the end product will be executed and owned by data engineering. What’s happened to me as the BI/Analytics Engineer is a complete hand off, where there is no collaboration and the final code delivered is what I wrote. That’s bad and needs to be avoided!

Visualization & Insights Layer (Owned by the BI team)
1. Build semantic layers within the BI tool (Looker) or externally (cube.js)
2. Define semantic layer-specific measures, dimensions and drill-downs
3. Work with stakeholders on business questions and KPIs to answer those questions
4. Train business users on self-service

Set up QA & Testing

Lastly - a great way to reduce friction is to create clear QA processes for both the data engineering and BI teams. In addition to documenting this framework, I’d recommend building a Standard Operating Procedure for the dashboard, report, and/or data product development process.

Setting up ample time for code reviews, data validation, and testing drastically reduces the risk of project failure and friction between teams. This should already be considered if you’re running your team in a CI/CD methodology.

The Executive Influence

There you have it. The unique roles and framework when structuring an enterprise data team. My hope is that what we’ve defined here will aid in reducing friction between data engineering and BI. Now, how do you go and implement this? Remember, it starts at the top. You as the data leader reading this need to leverage this framework, document and present it to your team.

In all the enterprise teams I’ve been a part of, there wasn’t clear documentation, a default process, a point of reference or expectations set from the get-go between data engineers and BI. I think that leadership is about communication and building the framework to succeed. It’s not any different with designing a data team that is able to work together efficiently.