Python Clean Code: 6 Best Practices to Make Your Python Functions More Readable

6 min readJan 22, 2021

Photo by The Creative Exchange on Unsplash

Motivation

Have you ever looked at a function you wrote one month earlier and found it difficult to understand in 3 minutes? If that is the case, it is time to refactor your code. If it takes you more than 3 minutes to understand your code, imagine how long it would take for your teammates to understand your code.

If you want your code to be reusable, you want it to be readable. Writing clean code is especially important to data scientists who collaborate with other team members in different roles.

You want your Python function to:

be small
do one thing
contain code with the same level of abstraction
have fewer than 4 arguments
have no duplication
use descriptive names

These practices will make your functions more readable and easier to detect errors.

Inspired by the book Clean Code: A Handbook of Agile Software Craftsmanship by Robert C. Martin with code examples written in Java, I decided to write an article on how to write clean code in Python for data scientists.

In this article, I will show you how to utilize the 6 practices mentioned above to write better Python functions.

Get Started

Let’s start by taking a look at the function load_data below.

The functionload_data tries to download data from Google Drive and extract the data. Even though there are many comments in this function, it is difficult to understand what this function does in 3 minutes. It is because:

The function is awfully long
The function tries to do multiple things
The code within the function is at multiple levels of abstractions.
The function has more than 3 arguments
There are multiple duplications
Function’s name is not descriptive

We will refactor this code by using the 6 practices mentioned above

A function should be small because it is easier to know what the function does. How small is small? There should rarely be more than 20 lines of code in one function. It can be as small as below. The indent level of a function should not be greater than one or two.

Do One Task

A function should complete only one task, not multiple tasks. The function load_data tries to do multiple tasks such as download the data, unzip the data, get names of files that contain train and test data, and extract texts from each file.

Thus, it should be split into multiple functions like below

And each function should do only one thing:

The function download_zip_data_from_google_drive only downloads a zip file from Google Drive and does nothing else.

One Level of Abstraction

The code within the function extract_texts_from_multiple_files is at a different level of abstraction from the function.

The level of abstraction is the amount of complexity by which a system is viewed or programmed. The higher the level, the less detail. The lower the level, the more detail. — PCMag
https://www.facebook.com/events/324742855472441/
https://www.facebook.com/events/824878198077759/
https://www.facebook.com/events/2511912875780543/
https://www.facebook.com/British-Irish-Lions-Tour-2021-100460742049953
https://www.facebook.com/events/1649802091872693/
https://www.facebook.com/events/242420070725477/
https://www.facebook.com/events/409477110286116/
https://www.facebook.com/events/439916120530526/
https://www.facebook.com/events/898073201041888/
https://www.facebook.com/events/615932315867705/
https://www.facebook.com/events/1037547006766126/
https://www.facebook.com/events/440659440447848/
https://www.facebook.com/2021-World-Mens-Handball-Championship-102279685190503
https://www.facebook.com/events/166014955297150/
https://www.facebook.com/events/313028836804043/

That is why:

The function extract_texts_from_multiple_files is at a high-level of abstraction.
The code list_of_text_in_one_file =[r.text for r in ET.parse(join(path_to_file, file_name)).getroot()[0]] is at a low-level of abstraction.

To make the code within the function to be at the same level of abstraction, we can put the low-level code into another function.

Now, the code extract_texts_from_each_file(path_to_file, file) is at a high-level of abstraction, which is the same level of abstraction as the function extract_texts_from_multiple_files .

Duplication

There is duplication in the code below. The part of code that is used to get the training data is very similar to the part of code that is used to get the test data.

We should avoid duplication because:

It is redundant
If we make a change to one piece of code, we need to remember to make the same change to another piece of code. If we forget to do so, we will introduce bugs into our code.

We can eliminate duplication by putting the duplicated code into a function.

Since the code to extract texts from training files and the code to extract texts from test files are similar, we put the repeated code into the function extract_tests_from_multiple_files. This function can extract texts from either training or test files.

Descriptive Names

A long descriptive name is better than a short enigmatic name. A long descriptive name is better than a long descriptive comment. — Clean Code by Robert C. Martin

Users can understand what the functionextract_texts_from_multiple_files does by looking at its name.

Don’t be afraid to write long names. It is better to write long names rather than write vague names. If you try to shorten your code by writing something like get_texts , it would be difficult for others to understand exactly what this function does without looking at the source code.

If the descriptive name of a function is too long such as download_file_from_ Google_drive_and_extract_text_from_that_file . It is a good sign that your function is doing multiple things and you should split it into smaller functions.

Have Fewer than 4 Arguments

A function should not have more than 3 arguments since it is a sign that the function is performing multiple tasks. It is also difficult to test a function with more than 3 different combinations of variables.

For example, the function load_data has 4 arguments: url, output_path, path_train, andpath_test . So we might guess that it tries to do multiple things at once:

Use url to download data
Save it at output_path
Extract the train and test files in output_path and save it to path_train , path_test

If a function has more than 3 arguments, consider turning it into a class.

For example, we could split load_data into 3 different functions:

Since the functions download_zip_data_from_google_drive , unzip_data , and get_train_test_docs are all trying to achieve one goal: get data, we could put them into one class called DataGetter .

As we can see, none of the functions above have more than 3 arguments! Even though the code that uses a class is longer compared to the code that uses a function, it is much more readable! We also know exactly what each piece of code does.

How do I write a function like this?

Don’t try to be perfect when starting to write code. Start with writing down complicated code that matches your thoughts. Then as your code grows, ask yourself whether your function violates any of the practices mentioned above. If yes, refactor it. Test it. Then move on to the next function.

Conclusion

Congratulations! You have just learned 6 best practices to write readable and testable functions. Since each function does one task, it will make it easier for you to test your functions and ensure that they pass the unit tests when a change is made.

If you make it effortless for your teammates to understand your code, they will be happy to reuse your code for other tasks.

The source code of this article could be found here.

I like to write about basic data science concepts and play with different algorithms and data science tools. You could connect with me on LinkedIn and Twitter.

Star this repo if you want to check out the codes for all of the articles I have written. Follow me on Medium to stay informed with my latest data science articles like these: