CST338 Learning Log#3

This week in class I spent a lot of time working with NumPy, Pandas, and data visualization in Python. Early in the week I focused on NumPy arrays, indexing, boolean masks, and vectorized operations. I also practiced using list comprehensions and learned more about the differences between Python lists and NumPy arrays. One thing I noticed is that NumPy operations become much cleaner and more efficient once you stop thinking in terms of loops and start thinking in terms of whole-array operations.

Later in the week we moved into Pandas Series and DataFrames. I learned how to select columns, filter rows with boolean conditions, group data, compute statistics, and rename columns. I also became more comfortable reading dataframe summaries with functions like info() and describe(). At first I mixed up when operations returned a Series versus a DataFrame, but after working through the labs I feel much more confident about it.

The visualization part of the labs was especially interesting. We created density plots, histograms, cumulative distribution plots, and box plots using both Matplotlib and Seaborn. I learned how choices like bin width and KDE bandwidth can significantly change the appearance and interpretation of a graph. It was also interesting to compare histograms and density plots side-by-side to understand how they represent distributions differently.

One concept I am still thinking about is the relationship between a sample distribution and a theoretical distribution. In the later problems we generated random samples from a normal distribution and compared the sample density plots to the original data. I understand the basic idea that samples will vary due to randomness, but I still want to develop a better intuition for how sample size affects the smoothness and accuracy of the estimated distribution.

I also found the discussions about skewness and transformations useful. Seeing how applying log10 to the Apps variable changed the shape of the histogram made it clearer why transformations are important in data analysis. Before this week I understood skewness mostly as a definition, but now I can actually see how it affects plots and statistical interpretation.

Overall, I feel like I made good progress on week three. The labs were long, but they helped reinforce the ideas through repetition and visualization. I am becoming much more comfortable reading and writing Pandas plotting code, and I feel more confident interpreting the output rather than just generating it.

Comments

Popular posts from this blog

Computer Science BS Journal (CST334) : Week 3

Computer Science BS Journal (CST334) : Week 5

Computer Science BS Journal (CST363) : Week 7