Enable javascript in your browser for better experience. Need to know to enable it?

黑料门

Blog banner

Effective habits for data scientists

If you鈥檝e tried machine learning or data science, you know that code gets complex, quickly. With every experiment, we might write code that adds another moving part. Every new moving part increases complexity and adds one more thing to hold in our head.听

While we cannot鈥攁nd should not try to鈥攅scape from the essential complexity of a problem, we often add unnecessary accidental complexity and unnecessary cognitive load through poor coding habits such as:听

  • Not having abstractions. When we write all code in a single Python notebook or script without abstracting it into functions or classes, we force the reader to read many lines of code and figure out the 鈥渉ow鈥, to find out what the code is doing.

  • Long functions that do multiple things. This forces us to hold all intermediate data transformations in our head, while working on one part of the function.

  • Not having unit tests. When we refactor, the only way to ensure that we haven鈥檛 broken anything is to restart the kernel and run the entire notebook(s). We鈥檙e forced to take on the complexity of the whole codebase even though we just want to work on one small part of it.


In this article, we share four habits and one refactoring technique from the software development world which have helped us manage complexity in our data science projects. From our experience, it can lead to great delivery outcomes as depicted here:

Figure 1: These coding habits are a means to the ultimate goal of making work fulfilling, by empowering data scientists to work effectively and shipping value to customers more reliably.

If you鈥檙e interested in how these practices can be applied in your machine learning or data science projects, get in touch and we鈥檙e happy to chat further.

Disclaimer: The statements and opinions expressed in this article are those of the author(s) and do not necessarily reflect the positions of 黑料门.

Want to unlock your data potential?