Merge both train and test datasets so that preprocessing applies to both. Use dataframe_train.append(dataframe_test)

Remove a feature thats not useful

dataframe.drop(col_name, axis value 1 for column, inplace True for changing original vs returning copy)

Convert categorical values to numbers

For nominal and ordinal: dataframe[’column name’].map({’F’:1, ‘M’:2})

For interval: convert to groups of intervals i.e. age(0-18, 19- 25, …) and assign values like 1,2, 3…

Replace missing values with mode

dataframe[’column name'].fillna(dataframe[’column name'].mode()[0])

Use fillna function of dataframe column

Use mode function of dataframe column

Remove characters like + in intervals

i.e. 55+

Use dataframe[’column'= dataframe[’column'].str.replace(’+’, ‘’)

Check datatype of columns

If datatype is object, change to number

To check, dataframe.info()