June 15, 2021
My younger brother recently started coding, and it brought me back to when I was first starting my data journey. The amount of coding solutions on the market is overwhelming, but when contributing to production code there are tools that are more versatile and improve work efficiency. Here are some of the tools I'm familiar with and how you should structure your learning:
Pros: Quickest way to write, test, and share code locally. Atom has packages you can add-on to run code in-line, which is helpful.
Cons: One-off files are hard to manage and integrate at scale.
Many analysts spend most of their time inside visualization tools. When I worked as a data analyst, 90% of my work was done in one-off SQL scripts in the Periscope Data UI to generate visualizations for stakeholders; each query was written individually and was materialized as a chart or table based on the underlying query. While this is a step above text editors for testing code, visualization tools are notorious for slow loading times when crunching large amounts of data.
Pros: Visualizations are easily integrated with your data sources.
Cons: One-off SQL queries are hard to scale unless you get the analytical query in production upstream or the tool has functionality for more complex data models.
Notebooks are similar to text editors except you can save multiple scripts that can execute code and generate visualizations line by line. Popular with data scientists, notebooks are best put to use when hosted on live environments so code can be tested on the most up to date data.
Pros: Can run code inline step by step on production data.
Cons: The notebook UI be painfully slow when handling large queries, and they are typically one-off scripts for testing that don't need to be merged into production code.
The command line is probably the most powerful tool a developer needs to master. An entirely different set of commands is required to navigate this black screen with a flashing cursor -- users that are more comfortable with point-and-click navigation may be intimidated. Editing scripts in the native text editors, such as vi and vim, is cumbersome, but the benefit to learning the command line includes running tests and executing API calls on live data with proper credentials.
Pros: Can test with live data/code. Every developer needs to be familiar with this anyway.
Cons: Clunky to save, update, and run code.
The most versatile tool to write code is in an IDE because it combines multiple tools in one. The UI is similar to a text editor, but many have native command line functionality and other keyboard hotkeys, which make editing code more efficient. Integrations with live data sources make local testing easy, and I highly recommend these tools for seamless version control.
Pros: Extremely versatile and customizable. Integrated with the command line and can sync with a git repo.
Cons: Might be too powerful for your coding needs at first. Not all programs are free.
There is no one-size-fits all solution for coding tools. Each program is usually customizable to suit your needs depending on your use case. As a data engineer, most of my time is split between my IDE and the Snowflake UI to write and test my code. It takes a lot of trial and error with different tools to see which works best for you. If you're looking to start programming in Python, I highly recommend Python Crash Course by Eric Matthes. It goes over the basics and even has three highly relevant projects at the end!
James Roselle is a data engineer based in Boston.Learn more!