Last Updated on May 23, 2024 by somnath796
After the Harvard Business Review Article about “Data Scientist being the sexiest job of the 21st century“, the world has changed dramatically. After the article was published in 2012, it’s been a decade already. So exactly how has the landscape changed and will compare and contrast the two most demanding roles in the data infrastructure field namely Data Scientist V/s Data Engineer.
If these two roles seem to be jargon to you, let me briefly explain the two:-
Data Engineer Job Responsibilities–
- Building and Maintaining Data Pipelines: Data engineers design, develop, and manage the systems that collect, transform, and load data from various sources (databases, APIs, etc.) into usable formats for analysis. This ensures a smooth flow of data for data scientists, analysts, and other stakeholders.
- Ensuring Data Quality and Security: Data engineers implement processes to clean, validate, and ensure the accuracy and consistency of data. They also implement security measures to protect sensitive data throughout the data pipeline.
- Data Storage and Infrastructure Management: Data engineers choose and manage the appropriate storage solutions (data warehouses, data lakes) based on the type and volume of data. They also configure and maintain the infrastructure required for data processing, ensuring scalability and efficiency.
Data Scientist Job Responsibilities-
- Data Analysis and Modeling: Data Scientists delve into data to uncover patterns, trends, and insights. They use statistical analysis, machine learning algorithms, and other techniques to build models that can predict future outcomes, classify data points, or make recommendations.
- Data Communication and Visualization: Effective communication of insights is crucial. Data Scientists create clear and compelling visualizations (charts, graphs) to present their findings to both technical and non-technical audiences. They also write reports and recommendations to ensure stakeholders understand the implications of their analysis.
- Data Experimentation and Iteration: The scientific method is key. Data Scientists design and conduct experiments to test hypotheses and refine their models. They iterate on their approach continuously, staying up-to-date with new tools and techniques to ensure the best possible results.
Let’s try to analyze the two job roles based on below attributes-
- Barrier to entry and learning curve
- Competition in the job market
- Pay Scale
- Job satisfaction
- Skill Transferability between companies and domains
Barrier to entry and Learning curve–
Becoming both a Data Scientist(DS) or a Data Engineer(DE) requires full-time education. Graduates with majors in Statistics, Mathematics, Computer Science, and even Economics are becoming DS. So it’s understandable that you must have a good command of Mathematics and Statistics to enter the DS field. Also, one should know scripting languages like Python/R and SQL for data wrangling, transformation, and modeling. In contrast, DE is kind of a sub-part of software engineering that requires excellent programming skills and problem-solving abilities. Additionally one should understand the concepts of distributed architectures, and cloud computing methodologies to be able to excel in your day-to-day responsibilities.
The point is that if you are not from a computer science background, it becomes very hard to break into the DE role compared to DS role.
Competition in the job market–
As per my understanding from different job portals and my own research, DS has become saturated already. As highlighted any guy who can just spin off a jupyter notebook and copy paste some python codes from the web wants to become a DS. The main theme in the past 10 years is that people really love and gravitate towards the word science and love the idea of being a scientist despite not having a science background at all.
Whereas DE requires an engineering mind to be able to stitch things together and make data pipelines last even during traffic storms. DE is low competition, is more mature and at least the tasks are consistent with the employers. If you take a look at the population of two sub-reddits r/Dataengineering (181K) V/s r/Datascience(1.6M) , you can easily spot the difference. Everybody is selling DS online, they’re selling the shovel and the dream now, it’s too much hyped.
Coming to the number of job openings available in each role, one LinkedIn search can show that there are at least twice as many DE jobs compared to DS jobs. And mostly all DS jobs require a Master’s degree to get hired or having substantial experience as an Analyst.
Pay Scale–
The typical salary of DE and DS are pretty similar. According to Payscale.com, the average pay for DS is 9.72 LPA whereas for DE it is 8.72 LPA. For both roles, the top pay goes up to 40+LPA. But the point to understand here is that the time and effort required to excel in DE is almost half the effort required to excel in DS. Glassdoor and Ambitionbox have similar salary estimates for both roles.
Job satisfaction–
Before deciding what you want to do, there are some things to think about. For Data Engineers, when they finish a project, they have something tangible to show for it, like ETL/Data pipelines. These are made according to what Analysts/Data Scientists need from the data. So, they have a clear thing they’ve made. For Data Scientists, solving problems is an ongoing process. There’s no definite end because there are always more questions to explore and improve algorithms for better results. Additionally, the daily tasks can become repetitive for data engineers but as a data scientist, one gets to work on new problem statements.
Skill Transferability between companies and domains–
For Data Engineers, the job role doesn’t change much with companies and domains. The major techstack can vary but the overall crux for building data pipelines remains the same. So the skills are transferable across organisations. In the case of Data scientists, hard skills only constitute 50% of the work, the rest depends on the business understanding of that particular domain whether it’s Healthcare, Banking, or Marketing. This kind of knowledge is very niche to a particular company and is not easily transferrable. So this makes DE less cumbersome when coming to switching jobs compared to DS.