pandas is a Python data analysis library built on NumPy that provides the DataFrame and Series structures for labeled, tabular, and time-series data. The 2.x release line introduced PyArrow-backed string and extension dtypes, copy-on-write semantics (opt-in via pd.options.mode.copy_on_write=True, becoming default in 3.0), and significant performance improvements for groupby and merge operations. Students meet pandas in data science courses (CS109 Harvard, CS246 Stanford Mining Massive Datasets, 6.S897 MIT Machine Learning for Healthcare), in any course that grades exploratory data analysis or feature engineering, and in Kaggle-style competition assignments.
The library splits into IO (read_csv, read_parquet, read_sql, read_json, read_excel), data manipulation (loc and iloc indexing, query, assign, pipe), aggregation (groupby, agg, transform, apply), reshaping (pivot, pivot_table, melt, stack, unstack), merging (merge, join, concat), time series (date_range, resample, rolling, ewm, time zone localization and conversion), and visualization (DataFrame.plot built on matplotlib, with seaborn as the typical extension). CSHH tutors deliver DataFrame pipelines using method chains (.pipe for custom functions), .loc and .iloc for explicit indexing (avoiding the chained indexing that triggers SettingWithCopyWarning), groupby with agg passing a dict of column-to-function mappings for multi-output aggregation, merges that explicitly state how (inner, left, right, outer) and on (column name or index), and the .copy() pattern when a slice will be modified independently of the source.