API
Here we have all galeritas functions to generate the plots created for the library and its documentation.
-
galeritas.bar_plot_with_population_proportion(df, x, y, func=<function median>, show_error_bar=True, show_na=True, na_label='Null', circle_diameter=150, split_variable=False, colors=None, color_palette=None, x_label=None, y_label=None, show_qty=True, qty_label='Quantity', proportion_label='Percentage', proportion_format='.2f', show_population_func=False, population_format='.0f', population_func_legend='Median population value', population_legend='Population %', up_label='Positive values', down_label='Negative values', plot_title=None, figsize=(16, 7), ax=None, return_fig=False, **legend_kwargs)[source] Produces a barplot with an additional dotplot showing the percentage of the dataset population for each category of the barplot.
Sometimes it is useful to split the numeric variable of interest into positive and negative values and plot it as a function of the categorical variable. This can be controlled with the split_variable parameter.
Parameters: - df (DataFrame) – A dataframe containing the dataset.
- x (str) – A string indicating the dataframe’s column name of the x-axis variable. It will be treated as a categorical variable.
- y (str) – A string indicating the dataframe’s column name of the y-axis variable. It will be treated as a numeric variable in which an aggregation function (defined in func parameter) will be applied.
- func (function, optional) – Aggregation function to be applied in the y-axis variable. The default function is to calculate the median, but other functions are accepted (see here). Default:
np.median- show_error_bar (bool, optional) – If True, shows the default confidence intervals estimated by Seaborn (for more information, see this link with Seaborn’s barplot documentation).
Default:True- show_na (bool, optional) – If True, shows the missing values in the column passed by x parameter.
Default:True- na_label (str, optional) – The label used to identify the missing values in the column passed by x parameter.
Default:'Null'- circle_diameter (int, optional) – Base circle diameter of the percentage dots. You might want to decrease it if there’s a category in the x-axis variable that accounts a big proportion of the dataset (e.g. 80%).
Default:150- split_variable (bool, optional) – If True, it splits the y-axis variable into positive and negative values, showing upward bars for positive values and downward bars for negative values.
Default:False- colors (list of str, optional) – A list containing the hexadecimal colors of each hue. The number of elements on the list must be the same of hue groups.
Default:None- color_palette (str, optional) – If this parameter is set, uses the color_palette to set different colors of the palette for each hue value. If both colors and color_palette parameters are None, uses Galeritas default palette.
Default:None- x_label (str, optional) – Text to describe the x-axis label. If None, the x value is used.
Default:None- y_label (str, optional) – Text to describe the y-axis label. If None, the y value is used.
Default:None- show_qty (bool, optional) – If True, shows the quantity of the population for each category below its percentage.
Default:True- qty_label (str, optional) – Sets the label of the quantity that will appear at the right side of the plot.
Default:'Quantity'- proportion_label (str, optional) – Sets the label of the percentage that will appear at the right side of the plot.
Default:'Percentage'- proportion_format (str, optional) – Formats the population percentage with exactly n digits following the decimal point. The default value shows 2 digits after the decimal point.
Default:.2f- show_population_func (bool, optional) – If True, shows a dashed line describing the aggregation function chosen for the entire population.
Default:False- population_func_legend (str, optional) – A text that will appear in the legend describing the dashed line of the entire population.
Default:'Median population value'- population_format (str, optional) – Formats the number resulted of the aggregation function for the entire population that will appear near the dashed line with exactly n digits following the decimal point. The default value shows 0 digits after the decimal point.
Default:'.0f'- population_legend (str, optional) – Text to describe the circles representing the population percentage.
Default:Population %- up_label (str, optional) – Text to describe the up bars. It will only be showed if split_variable is True.
Default:Positive values- down_label (str, optional) – Text to describe the down bars. It will only be showed if split_variable is True.
Default:Negative values- plot_title (str, optional) – Text to describe the plot’s title.
Default:None- figsize (tuple, optional) – A tuple that indicates the figure size (respectively, width and height in inches).
Default:(16, 7)- ax (matplotlib.axes, optional) – Custom figure axes to plot.
Default: :code: None- return_fig (bool, optional) – If True return figure object.
Default:False- legend_kwargs (key, value mappings) – Matplotlib.pyplot’s legend arguments such as bbox_to_anchor and ncol. Further informations here.
- show_error_bar (bool, optional) – If True, shows the default confidence intervals estimated by Seaborn (for more information, see this link with Seaborn’s barplot documentation).
Returns: Returns the figure object with the plot (*return_fig parameter needs to be set)
Return type: Figure
-
galeritas.plot_calibration_and_distribution(df, target, predictions, n_bins=20, strategy='quantile', x_lim=None, y_lim=None, show_distribution=True, color='#3377bb', return_fig=False, ax=None)[source] Returns a calibration curve for predicted values. If wanted, it will also return a distribuition plot.
Parameters: - df (pd.Dataframe) – a pd.Dataframe that contains target and prediction data
- target (string) – name of target column
- predictions (string) – name of prediction column
- n_bins (int, optional) – number of bins to discretize the [0, x_lim] interval in calibration curve. Default:
20- strategy (string, optional) – strategy used in calibration curve:
Default:quantileuniform: the bins have identical widths. quantile: The bins have the same number of samples and depend on y_prob.- x_lim (float, optional) – width of x axes in calibration and distribution curve.
Default:None- y_lim (float, optional) – width of y ax in calibration curve.
Default:None- show_distribution (boolean, optional) – if distribution graph is wanted
Default:True- color (str, optional) – personalized color
Default:#3377bb- return_fig (bool, optional) – If True return figure object.
Default:True - strategy (string, optional) – strategy used in calibration curve:
Returns: Returns the figure object with the plot (*return_fig parameter needs to be set)
Return type: Figure
-
galeritas.plot_ecdf_curve(df, column_to_plot, drop_na=True, hue=None, hue_labels=None, colors=None, color_palette=None, plot_title=None, percentiles=(25, 50, 75), percentiles_title='Percentiles', mark_percentiles=True, show_percentile_table=False, figsize=(16, 7), ax=None, return_fig=False, **legend_kwargs)[source] Generates an empirical cumulative distribution function. Theorical Reference can be found here.
Parameters: - df (DataFrame) – A dataframe containing the dataset.
- column_to_plot (str) – Column name of the observed data.
- drop_na (bool, optional) – If True, removes the missing values of the column to be plotted. Otherwise, plots the distribution without removing the missing values, but doesn’t calculates the percentiles. Default:
True- hue (str, optional) – A string indicating the dataframe’s column name containing the categories if is wanted to plot the distribution using the column passed by column_to_plot parameter for each category that appears at the column passed by hue parameter.
Default:None- hue_labels (Dict, optional) – Parameter to be used if is wanted to show a label of hue categories different from the actual values existing in the column passed by hue parameter. It’s necessary to pass a dictionary containing the values to be replaced and the values that will replace them (e.g. {1:’True’, 0: ‘False’}).
Default:None- colors (list of str, optional) – A list containing the hexadecimal colors of each hue. The number of elements on the list must be the same of hue groups.
Default:None- color_palette (str, optional) – If colors parameter is None, uses the color_palette to set different colors of the palette for each hue value. If both colors and color_palette parameters are None, then uses the default palette of the library.
Default:None- plot_title (str, optional) – Text to describe the plot’s title.
Default:None- percentiles (tuple, optional) – A tuple that indicates the percentiles of the distributions.
Default:(25, 50, 75)- percentiles_title (str, optional) – A string to be used to indicate the percentiles.
Default:Percentiles- mark_percentiles (bool, optional) – If True, shows the percentiles defined in parameter percentiles.
Default:True- show_percentile_table (bool, optional) – If True, shows a table with the values for each percentile and category.
Default:False- figsize (tuple, optional) – A tuple that indicates the figure size (respectively, width and height in inches).
Default:(16, 7)- ax (matplotlib.axes, optional) – Custom figure axes to plot.
Default: :code: None- return_fig (bool, optional) – If True return figure object.
Default:Fase- legend_kwargs (key, value mappings) – Matplotlib.pyplot’s legend arguments such as bbox_to_anchor and ncol. Further informations here.
- hue (str, optional) – A string indicating the dataframe’s column name containing the categories if is wanted to plot the distribution using the column passed by column_to_plot parameter for each category that appears at the column passed by hue parameter.
Returns: Returns the figure object with the plot
Return type: Figure
-
galeritas.plot_ks_classification(df, y_pred, y_true, min_max_scale=None, show_p_value=True, pos_value=1, neg_value=0, pos_label='1', neg_label='0', pos_color='#3377bb', neg_color='#b33d3d', figsize=(12, 7), plot_title='Kolmogorov–Smirnov (KS) Metric', x_label='Predicted Probability', ax=None, return_fig=False)[source] Produces a KS plot for predicted values (or scores) vs true value (0/1)
Parameters: - df (pd.Dataframe) – a pd.Dataframe that contains y_pred and y_true columns
- y_pred (float) – column name in df corresponding to predictions
- y_true (integer) – column name in df corresponding to target values (0 or 1)
- min_max_scale (tuple, optional) – Tuple containing (min, max) values for scaling y_pred Default:
None- show_p_value (bool, optional) – If True plot p-value for the KS together with curves
Default:True- pos_value (integer, optional) – Integer 0/1 indicating which is the positive value in the y_true (in some applications 0 may indicate a ‘bad behavior’, like default)
Default:1- neg_value – Integer 0/1 indicating which is the negative value in the y_true (in some applications 0 may indicate a ‘bad behavior’, like default)
Default:0- pos_label (str, optional) – personalized label for positive value
Default:1- neg_label (str, optional) – personalized label for negative value
Default:0- pos_color (str, optional) – personalized color for positive value
Default:#3377bb- neg_color (str, optional) – personalized color for negative value
Default:#b33d3d- figsize (tuple, optional) – tuple containing (height, width) for plot size
Default:(12, 7)- plot_title (str, optional) – main title of plot
Default:Kolmogorov-Smirnov (KS) Metric- x_label (str, optional) – personalized x_label
Default:Predicted Probability- ax (matplotlib.axes, optional) – Custom figure axes to plot.
Default: :code: None- return_fig (bool, optional) – If True return figure object.
Default:True - show_p_value (bool, optional) – If True plot p-value for the KS together with curves
Returns: Returns the figure object with the plot (*return_fig parameter needs to be set)
Return type: Figure
-
galeritas.plot_precision_and_recall_by_probability_threshold(df, prediction_column_name, target_name, target=1, n_trials=50, sample_size_percent=0.5, quantiles=[0.05, 0.5, 0.95], thresholds_to_highlight=None, x_label='Model probability threshold', y_label="Metric's Ratio", plot_title=None, colors=None, color_palette=None, figsize=(16, 7), ax=None, return_fig=False, **legend_kwargs)[source] Determines precision, recall e support scores for different thresholds for the positive class, using a data sample with replacement.
Adapted from Insight Data Science’s post.
Parameters: - df (DataFrame) – Dataframe containing predictions and target columns.
- prediction_column_name (str) – String that indicates the name of the columns where the predictions are.
- target_name (str) – String that indicates the target name.
- target (int, optional) – Indicates the target class. Default:
1- n_trials (int, optional) – Indicates the number of times to resample the data and make predictions.
Default:50- sample_size_percent (float, optional) – Indicates the percentage of the dataset that needs to be used to perform the sample data.
Default:0.5- quantiles (list, optional) – Indicates the upper, median and lower quantiles to be used to plot the graph.
Default:[0.05, 0.5, 0.95]- thresholds_to_highlight (list, optional) – Indicates the score(s) where the thresholds will be drawn.
Default:None- x_label (str, optional) – Text to describe the x-axis label.
Default:"Model probability threshold"- y_label (str, optional) – Text to describe the y-axis label.
Default:"Metric's Ratio"- plot_title (str, optional) – Text to describe the plot’s title.
Default:None- colors (list of str, optional) – A list containing the hexadecimal colors of each hue. The number of elements on the list must be the same of hue groups.
Default:None- color_palette (str, optional) – If this parameter is set, uses the color_palette to set different colors of the palette for each hue value. If both colors and color_palette parameters are None, uses Galeritas default palette.
Default:None- figsize (tuple, optional) – A tuple that indicates the figure size (respectively, width and height in inches).
Default:(16, 7)- ax (matplotlib.axes, optional) – Custom figure axes to plot.
Default: :code: None- return_fig (bool, optional) – If True return figure object.
Default:False- legend_kwargs (key, value mappings) – Matplotlib.pyplot’s legend arguments such as bbox_to_anchor and ncol. Further informations here.
- n_trials (int, optional) – Indicates the number of times to resample the data and make predictions.
Returns: Returns the figure object with the plot (*return_fig parameter needs to be set)
Return type: Figure
-
galeritas.stacked_percentage_bar_plot(df, categorical_feature, hue, hue_labels=None, plot_title=None, annotate=False, show_na=True, na_label='Null', colors=None, color_palette=None, figsize=(16, 7), ax=None, return_fig=False, **legend_kwargs)[source] Generates a stacked percentage bar plot. It will generate a bar for each given category and inside each bar, will stack each group on the top of the other, showing each group representation (proportionally) for each category.
Parameters: - df (DataFrame) – A dataframe containing the dataset.
- categorical_feature (str) – A string indicating the dataframe’s column name that will be used to create each plot’s bar representing a category.
- hue (str) – A string indicating the dataframe’s column name of the groups that will be stack for each category.
- hue_labels (dict, optional) – A dictionary describing the labels of each hue group. If None, uses the values of the hue group in the dataframe. Default:
None- plot_title (str, optional) – Text to describe the plot’s title.
Default:None- annotate (bool, optional) – If True, shows the amount of rows of each hue group inside each category.
Default:False- show_na (bool, optional) – If True, shows the missing values for both hue group and categories.
Default:True- na_label (str, optional) – The label used to identify the missing values.
Default:'Null'- colors (list of str, optional) – A list containing the hexadecimal colors of each hue. The number of elements on the list must be the same of hue groups.
Default:None- color_palette (str, optional) – If colors parameter is None, uses the color_palette to set different colors of the palette for each hue value. If both colors and color_palette parameters are None, then uses the default palette of the library.
Default:'pastel'- figsize (tuple, optional) – A tuple that indicates the figure size (respectively, width and height in inches).
Default:(16, 7)- return_fig (bool, optional) – If True return figure object.
Default:True- legend_kwargs (key, value mappings) –
Matplotlib.pyplot’s legend arguments such as bbox_to_anchor and ncol. Further informations here.
- plot_title (str, optional) – Text to describe the plot’s title.
Returns: Returns the figure object with the plot (*return_fig parameter needs to be set)
Return type: Figure