Week 2 - BALT 4396 - Handling and Cleaning Data with Python Libraries

Python as a Tool

Python is one of the most useful programming languages that many are unfamiliar with on how to use. Its usefulness in manipulating and reading large sets of data is unmatched and the fact that it's open source in nature improves on exactly that. There are many different types of libraries within Python, Pandas, and NumPy in particular, make importing, manipulating, and cleaning of data easier than ever. Those in analytic roles understand the effectiveness of clean data and Python can help significantly in this field.


Panda has two main data structures of Series and DataFrame. Each is useful in its own way, however, one might be more suited toward an end goal than the other. Series is a one-dimensional labeled array capable of holding any data type while DataFrame is a two-dimensional labeled data structure with columns of different types. The useful matter of it all is that Pandas can import data from CSV, Excel, JSON, and SQL.

NumPy on the other hand, also known as Numerical Python, supports large arrays that are often multi-dimensional. Not only can it process matrices, but it also has math functions to help it operate effectively. Cleaning data is important for many fields as it ensures accurate depiction over a wide range of results and with Python libraries, the possibilities are endless. Missing data and even duplicates can be removed easily with certain commands which makes data manipulation something anyone can do with proper understanding.

I can see Python being something that many companies could benefit from as it understands almost any use case. It can replace the need for traditional Excel spreadsheet files and offer more flexibility because of all the different tools it can execute. Whether it's a small set of data or something extremely large, the automation features can lessen human error. In my current market research career, we collect a lot of survey data and Python could be a difference-maker because it would allow us to interpret and alter our data in ways that would benefit our objectives.

Source: Kelsey, T. (2023). Data Toolkit: Python + Hands-On Math.

Comments