January 15, 2021
When slicing a pandas Dataframe we have an index-based selection data.iloc[row index, column index] and label-based selection data.loc[row label, column label]. Tutorials abound for this; however, when I have a large dataset with a numeric or time-series index and labeled columns, more often than not I simply want to select rows based on index and column from a label. This simple selection eluded me for much too long:
data.iloc[0].column_name
By default when creating a new Dataframe with multiple arrays pandas will stack them vertically (i.e. axis=0 or row-based). It boggles me that stacking arrays horizontally isn't included in the documentation:
data_one = np.array([1,2,3])
data_two = np.array([4,5,6])
pd.DataFrame([data_one, data_two], columns=['col_1', 'col_2', 'col_3'])
#   col_1	col_2	col_3
# 0   1     2     3
# 1   4     5     6
df = pd.DataFrame(np.column_stack((data_one, data_two)), columns=['col_1', 'col_2'])
#   col_1	col_2
# 0   1     4
# 1   2     5
# 2   3     6
And finally, let's avoid some "SettingwithCopyWarning" errors - when adding new columns to a DataFrame use the assign method:
df = pd.DataFrame([1, 2, 3], columns=['col_1'])
new_column = [4, 5, 6]
df = df.assign(col_2=new_column)
#   col_1  col_2
# 0   1     4
# 1   2     5
# 2   3     6