What is a Data Engineer? A guide to the differences between the data disciplines

Feb. 2, 2021

Data Engineer. Data Scientist. Data Analyst. Business Analyst. Business Intelligence Analyst. Big Data Engineer. Software Engineer - Data. Unless you work with data, you may think these job titles are the same and the responsibilities are equal; you may see the keywords "data" or "engineer" and think any job title containing those words is synonymous. I’ve lost count how many recruiters have reached out to me hiring for data science or software engineering positions. What are the key differences between these positions, and how can you make sure you're aligned with a potential employer on responsibilities?

Data analyst: the historical trend seeker

Data analysts are tasked with collecting, cleaning and synthesizing historical data into a compelling story for stakeholders. Since data across a business is disparate and does not typically have a common identifier the data analyst must fit these pieces of the puzzle together in a succinct manner. They straddle the line between quantitative analysis and business context; they manipulate data to look for patterns historically as the result of shifts in business strategy. Analysts may touch data science for predictive modeling or design data models for easier analysis, but their main function is to help non-technical people understand what happened in the business and why. They are experts on data visualization and know how to manipulate data into the right format for BI tools, such as Looker and Tableau, to ingest.

Business intelligence analysts can fall into this bucket, but they may do more modeling and infrastructure tasks depending on the structure of their team. Business analysts are typically more junior data analysts that use Excel, Google Sheets, and maybe a little SQL to manipulate and draw insights from data.

Key concepts: query building, KPIs (Key Performance Indicators), visualization, statistical significance, data modeling
Example tools and languages: Excel, SQL, Tableau, Looker

Data scientist: the future predictor

A data scientist uses data to predict the likelihood of future outcomes. They regularly use A/B testing, regression analysis, and other statistical models to help steer the direction of new initiatives. They often perform the same functions as a data analyst, but they take these findings and extrapolate potential scenarios using quantitative-heavy algorithms. An engineering team then deploys these models in production to officially incorporate changes into the product. Since humans are naturally risk-averse creatures, data science professionals help ensure confidence in a completely new business strategy or build a case to stop an existing process if it isn't performing as expected.

Data science and machine learning are the hot topics in the industry right now since companies are always looking to optimize strategies to stay ahead of the competition.

Key concepts: hypothesis testing, machine learning, statistical analysis, algorithms, ETL
Example tools and languages: Spark, Hadoop, Jupyter, Scala, R, Python

Data engineer: the backbone of data infrastructure

Interfacing primarily with internal teams, data engineers enable data analysts and data scientists for success through scalable infrastructure. They do the heavy lifting of cleaning, modeling, and ETL (extract, transform, load) to ensure data is trustworthy, accurate, and is delivered in a timely manner. While this may seem trivial from an outsider's perspective, if a business isn't investing in data engineering, decisions and deployment based on data will be slower.

For example, an analyst cannot build confidence in his or her analysis if a pipeline breaks without an alert and the underlying data is not up to date. While a data engineer performs exploratory data analysis, it is not their primary function to be subject matter experts on the data and they rarely perform statistical analysis. Their skillset is rooted in engineering principles, but is varied to design solutions depending on the needs of the business.

Key concepts: data warehouses, ETL, data lakes, APIs, integrations, storage, compute
Example tools and languages: SQL, Python, Snowflake, Redshift, Talend, Fivetran, AWS, Java, Looker

Software engineer: the app builder

Full stack software engineers build features and functionality on mobile and web apps. Logs and app data are typically inserted into a PostgreSQL or MySQL database during development upstream so software engineers can perform data engineering tasks if the company is resource-strapped. However, since their main responsibility is to build product features to meet end user needs, a software engineer's highest priority isn't to ensure data quality or produce an efficient data model for downstream reporting and analysis. Data platform teams are more aligned with software engineering skills as they skew heavily on scalable infrastructure and are far removed from reporting needs downstream.

Key concepts: client/server, uptime, web development
Example tools and languages: Javascript, CSS, HTML, Ruby, React, Java, node.js

I outlined what makes these jobs distinct, but there is plenty of overlap in skills needed to perform each role. At larger companies, teams are separated by specialization in each area of the data ecosystem. However, startups may have only a handful of data professionals, which means they are required to perform multiple or all functions. This makes navigating job titles and descriptions very tricky for data professionals. If it's not clear in the job listing, make sure you ask specific questions during the interview process to drill into what type of data position the company is looking to fill. You don't want to waste your time on an interview if they're looking for a data analyst when you want an engineering position.

If you are interested in learning more about how to become a data engineer, read my first blog about how I switched from a career in public relations to data engineering, or purchase The Data Warehouse Toolkit to get started.

About Me
My image

James Roselle is a data engineer based in Boston.

Learn more!