Posts

Showing posts from May, 2026

CST383 Learning Log#4

This week we focused heavily on data visualization and exploratory data analysis using Pandas, Matplotlib, and Seaborn. We worked with datasets involving campaign contributions and US Census information, and I learned that choosing the correct visualization is just as important as creating the graph itself. Different types of variables require different approaches. For example, histograms worked well for continuous variables like contribution amounts or hours worked per week, while grouped and stacked bar charts were better for comparing categories such as occupations, employment status, sex, and income level. One thing I improved on this week was using Pandas methods to summarize and prepare data before plotting. We used functions like groupby(), value_counts(), and crosstab() repeatedly. For example, this line was useful for comparing contribution amounts across categories: df.groupby('candidate')['contb_receipt_amt'].median().plot.barh() I also learned how normalizat...

CST383 Learning Log#3

This week in class I spent a lot of time working with NumPy, Pandas, and data visualization in Python. Early in the week I focused on NumPy arrays, indexing, boolean masks, and vectorized operations. I also practiced using list comprehensions and learned more about the differences between Python lists and NumPy arrays. One thing I noticed is that NumPy operations become much cleaner and more efficient once you stop thinking in terms of loops and start thinking in terms of whole-array operations. Later in the week we moved into Pandas Series and DataFrames. I learned how to select columns, filter rows with boolean conditions, group data, compute statistics, and rename columns. I also became more comfortable reading dataframe summaries with functions like info() and describe(). At first I mixed up when operations returned a Series versus a DataFrame, but after working through the labs I feel much more confident about it. The visualization part of the labs was especially interesting. We c...

CST383 Learning Log #2

  This week focused heavily on using Pandas and NumPy for data analysis and aggregation. I practiced working with Pandas Series and DataFrames, including creating columns, renaming columns, filtering rows with boolean masks, and using .loc and .iloc for indexing. I also worked with aggregation functions such as mean(), median(), value_counts(), and groupby() to analyze datasets like heart disease data, census data, penguin body mass data, and Lyft bike-sharing data. One important idea I learned is that Pandas automatically aligns data using indexes, which can be very powerful but also confusing when values are missing or indexes do not match. A concept I am still working on fully understanding is when to use groupby() versus value_counts(). I understand that value_counts() is useful for counting categories quickly, while groupby() is more flexible for computing statistics, but sometimes the two approaches seem similar. Another topic that took practice was combining boolean masks wi...

CST383 Learning Log #1

This was the first week of CST383 and we focused on building a strong foundation in Python for data science, especially working with NumPy and basic scripting tools. I practiced creating and manipulating arrays, including slicing, fancy indexing, and boolean masking. I also learned how vectorized operations allow computations to be performed efficiently across entire arrays without using loops. Working with both 1D and 2D arrays helped me better understand how to access specific rows and columns, as well as how to compute statistics like mean and median along different axes. In addition, I explored filtering data using conditions and using those filters to extract subsets of interest. One concept that I found a bit confusing at first was how boolean masks need to match the shape of the array they are indexing. For example, trying to index an array with a mask of a different length results in an error, which made me realize how important array dimensions are in NumPy. This made me ask: ...