trelawney package¶
Submodules¶
trelawney.base_explainer module¶
module that provides the base explainer class from which all future explainers will inherit
-
class
trelawney.base_explainer.
BaseExplainer
[source]¶ Bases:
abc.ABC
the base explainer class. this is an abstract class so you will need to define some behaviors when implementing your new explainer. In order to do so, override:
- the fit method that defines how (if needed) the explainer should be fited
- the feature_importance method that extracts the relative importance of each feature on a dataset globally
- the explain_local method that extracts the relative impact of each feature on the final decisionfor every sample in a dataset
-
explain_filtered_local
(x_explain: Union[pandas.core.series.Series, pandas.core.frame.DataFrame, numpy.ndarray], cols: List[str], n_cols: Optional[int] = None) → List[Dict[str, float]][source]¶ same as explain_local but applying a filter on each explanation on the features
-
explain_local
(x_explain: Union[pandas.core.series.Series, pandas.core.frame.DataFrame, numpy.ndarray], n_cols: Optional[int] = None) → List[Dict[str, float]][source]¶ explains each individual predictions made on x_explain. BEWARE this is usually quite slow on large datasets
Parameters: - x_explain – the samples to explain
- n_cols – the number of columns to limit the explanation to
-
feature_importance
(x_explain: Union[pandas.core.series.Series, pandas.core.frame.DataFrame, numpy.ndarray], n_cols: Optional[int] = None) → Dict[str, float][source]¶ returns a relative importance of each feature on the predictions of the model (the explainer was fitted on) for x_explain globally. The output will be a dict with the importance for each column/feature in x_explain (limited to n_cols)
if some importance are negative, this means they are negatively correlated with the output and absolute value represents the relative importance
Parameters: - x_explain – the dataset to explain on
- n_cols – the maximum number of features to return (ordered by importance)
-
filtered_feature_importance
(x_explain: pandas.core.frame.DataFrame, cols: Optional[List[str]], n_cols: Optional[int] = None) → Dict[str, float][source]¶ same as feature_importance but applying a filter first (on the name of the column)
-
fit
(model: sklearn.base.BaseEstimator, x_train: Union[pandas.core.series.Series, pandas.core.frame.DataFrame, numpy.ndarray], y_train: pandas.core.frame.DataFrame)[source]¶ prepares the explainer by saving all the information it needs and fitting necessary models
Parameters: - model – the TRAINED model the explainer will need to shed light on
- x_train – the dataset the model was trained on originally
- y_train – the target the model was trained on originally
-
graph_feature_importance
(x_explain: pandas.core.frame.DataFrame, cols: Optional[List[str]] = None, n_cols: Optional[int] = None, irrelevant_cols: Optional[List[str]] = None)[source]¶
-
graph_local_explanation
(x_explain: Union[pandas.core.series.Series, pandas.core.frame.DataFrame, numpy.ndarray], cols: Optional[List[str]] = None, n_cols: Optional[int] = None, info_values: Union[pandas.core.frame.DataFrame, pandas.core.series.Series, None] = None) → plotly.graph_objs._figure.Figure[source]¶ creates a waterfall plotly figure to represent the influance of each feature on the final decision for a single prediction of the model.
You can filter the columns you want to see in your graph and limit the final number of columns you want to see. If you choose to do so the filter will be applied first and of those filtered columns at most n_cols will be kept
Parameters: - x_explain – the example of the model this must be a dataframe with a single ow
- cols – the columns to keep if you want to filter (if None - default) all the columns will be kept
- n_cols – the number of columns to limit the graph to. (if None - default) all the columns will be kept
Raises: ValueError – if x_explain doesn’t have the right shape
trelawney.colors module¶
trelawney.lime_explainer module¶
-
class
trelawney.lime_explainer.
LimeExplainer
(class_names: Optional[List[str]] = None, categorical_features: Optional[List[str]] = None)[source]¶ Bases:
trelawney.base_explainer.BaseExplainer
Lime stands for local interpretable model-agnostic explanations and is a package based on this article. Lime will explain a single prediction of you model by crechariotsating a local approximation of your model around said prediction.’sphinx.ext.autodoc’, ‘sphinx.ext.viewcode’]
>>> X = pd.DataFrame([np.array(range(100)), np.random.normal(size=100).tolist()], index=['real', 'fake']).T >>> y = np.array(range(100)) > 50 >>> # training the base model >>> model = LogisticRegression().fit(X, y) >>> # creating and fiting the explainer >>> explainer = LimeExplainer() >>> explainer.fit(model, X, y) <trelawney.lime_explainer.LimeExplainer object at ...> >>> # explaining observation >>> explanation = explainer.explain_local(pd.DataFrame([[5, 0.1]]))[0] >>> abs(explanation['real']) > abs(explanation['fake']) True
-
explain_local
(x_explain: Union[pandas.core.series.Series, pandas.core.frame.DataFrame, numpy.ndarray], n_cols: Optional[int] = None) → List[Dict[str, float]][source]¶ explains each individual predictions made on x_explain. BEWARE this is usually quite slow on large datasets
Parameters: - x_explain – the samples to explain
- n_cols – the number of columns to limit the explanation to
-
feature_importance
(x_explain: Union[pandas.core.series.Series, pandas.core.frame.DataFrame, numpy.ndarray], n_cols: Optional[int] = None) → Dict[str, float][source]¶ returns a relative importance of each feature on the predictions of the model (the explainer was fitted on) for x_explain globally. The output will be a dict with the importance for each column/feature in x_explain (limited to n_cols)
if some importance are negative, this means they are negatively correlated with the output and absolute value represents the relative importance
Parameters: - x_explain – the dataset to explain on
- n_cols – the maximum number of features to return (ordered by importance)
-
fit
(model: sklearn.base.BaseEstimator, x_train: Union[pandas.core.series.Series, pandas.core.frame.DataFrame, numpy.ndarray], y_train: pandas.core.frame.DataFrame)[source]¶ prepares the explainer by saving all the information it needs and fitting necessary models
Parameters: - model – the TRAINED model the explainer will need to shed light on
- x_train – the dataset the model was trained on originally
- y_train – the target the model was trained on originally
-
trelawney.logreg_explainer module¶
Module that provides the LogRegExplainer class base on the BaseExplainer class
-
class
trelawney.logreg_explainer.
LogRegExplainer
(class_names: Optional[List[str]] = None, categorical_features: Optional[List[str]] = None)[source]¶ Bases:
trelawney.base_explainer.BaseExplainer
The LogRegExplainer class is composed of 3 methods: - fit: get the right model - feature_importance (global interpretation) - graph_odds_ratio (visualisation of the ranking of the features, based on their odds ratio)
-
explain_local
(x_explain: Union[pandas.core.series.Series, pandas.core.frame.DataFrame, numpy.ndarray], n_cols: Optional[int] = None) → List[Dict[str, float]][source]¶ returns local relative importance of features for a specific observation. :param x_explain: the dataset to explain on :param n_cols: the maximum number of features to return
-
feature_importance
(x_explain: Union[pandas.core.series.Series, pandas.core.frame.DataFrame, numpy.ndarray], n_cols: Optional[int] = None) → Dict[str, float][source]¶ returns the absolute value (i.e. magnitude) of the coefficient of each feature as a dict. :param x_explain: the dataset to explain on :param n_cols: the maximum number of features to return
-
fit
(model: sklearn.base.BaseEstimator, x_train: Union[pandas.core.series.Series, pandas.core.frame.DataFrame, numpy.ndarray], y_train: pandas.core.frame.DataFrame)[source]¶ prepares the explainer by saving all the information it needs and fitting necessary models
Parameters: - model – the TRAINED model the explainer will need to shed light on
- x_train – the dataset the model was trained on originally
- y_train – the target the model was trained on originally
-
graph_odds_ratio
(n_cols: Optional[int] = 10, ascending: bool = False, irrelevant_cols: Optional[List[str]] = None) → pandas.core.frame.DataFrame[source]¶ returns a plot of the top k features, based on the magnitude of their odds ratio. :n_cols: number of features to plot :ascending: order of the ranking of the magnitude of the coefficients
-
trelawney.shap_explainer module¶
trelawney.surrogate_explainer module¶
-
class
trelawney.surrogate_explainer.
SurrogateExplainer
(surrogate_model: sklearn.base.BaseEstimator, class_names: Optional[List[str]] = None)[source]¶ Bases:
trelawney.base_explainer.BaseExplainer
A surrogate model is a substitution model used to explain the initial model. Therefore, substitution models are generally simpler than the initial ones. Here, we use single trees and logistic regressions as surrogates.
-
adequation_score
(metric: Union[Callable[[numpy.ndarray, numpy.ndarray], float], str] = 'auto')[source]¶ returns an adequation score between the output of the surrogate and the output of the initial model based on the x_train set given.
-
explain_local
(x_explain: Union[pandas.core.series.Series, pandas.core.frame.DataFrame, numpy.ndarray], n_cols: Optional[int] = None) → List[Dict[str, float]][source]¶ returns local relative importance of features for a specific observation. :param x_explain: the dataset to explain on :param n_cols: the maximum number of features to return
-
feature_importance
(x_explain: Union[pandas.core.series.Series, pandas.core.frame.DataFrame, numpy.ndarray], n_cols: Optional[int] = None) → Dict[str, float][source]¶ returns a relative importance of each feature globally as a dict. :param x_explain: the dataset to explain on :param n_cols: the maximum number of features to return
-
fit
(model: sklearn.base.BaseEstimator, x_train: Union[pandas.core.series.Series, pandas.core.frame.DataFrame, numpy.ndarray], y_train: pandas.core.frame.DataFrame)[source]¶ prepares the explainer by saving all the information it needs and fitting necessary models
Parameters: - model – the TRAINED model the explainer will need to shed light on
- x_train – the dataset the model was trained on originally
- y_train – the target the model was trained on originally
-
trelawney.tree_explainer module¶
Module that provides the TreeExplainer class base on the Baseexplainer class
-
class
trelawney.tree_explainer.
TreeExplainer
(class_names: Optional[List[str]] = None)[source]¶ Bases:
trelawney.base_explainer.BaseExplainer
The TreeExplainer class is composed of 4 methods: - fit: get the right model - feature_importance (global interpretation) - explain_local (local interpretation, WIP) - plot_tree (full tree visualisation)
-
explain_local
(x_explain: Union[pandas.core.series.Series, pandas.core.frame.DataFrame, numpy.ndarray], n_cols: Optional[int] = None) → List[Dict[str, float]][source]¶ returns local relative importance of features for a specific observation. :param x_explain: the dataset to explain on :param n_cols: the maximum number of features to return
-
feature_importance
(x_explain: Union[pandas.core.series.Series, pandas.core.frame.DataFrame, numpy.ndarray], n_cols: Optional[int] = None) → Dict[str, float][source]¶ returns a relative importance of each feature globally as a dict. :param x_explain: the dataset to explain on :param n_cols: the maximum number of features to return
-
fit
(model: sklearn.base.BaseEstimator, x_train: Union[pandas.core.series.Series, pandas.core.frame.DataFrame, numpy.ndarray], y_train: pandas.core.frame.DataFrame)[source]¶ prepares the explainer by saving all the information it needs and fitting necessary models
Parameters: - model – the TRAINED model the explainer will need to shed light on
- x_train – the dataset the model was trained on originally
- y_train – the target the model was trained on originally
-
Module contents¶
Top-level package for trelawney.