CST383 Learning Log #2
This week focused heavily on using Pandas and NumPy for data analysis and aggregation. I practiced working with Pandas Series and DataFrames, including creating columns, renaming columns, filtering rows with boolean masks, and using .loc and .iloc for indexing. I also worked with aggregation functions such as mean(), median(), value_counts(), and groupby() to analyze datasets like heart disease data, census data, penguin body mass data, and Lyft bike-sharing data. One important idea I learned is that Pandas automatically aligns data using indexes, which can be very powerful but also confusing when values are missing or indexes do not match. A concept I am still working on fully understanding is when to use groupby() versus value_counts(). I understand that value_counts() is useful for counting categories quickly, while groupby() is more flexible for computing statistics, but sometimes the two approaches seem similar. Another topic that took practice was combining boolean masks wi...