Contribution¶
This project started because we saw people rewrite the same transformers and estimators at clients over and over again. Our goal is to have a place where more experimental building blocks for scikit learn pipelines might exist. This means we're usually open to ideas to add here but there are a few things to keep in mind.
Before You Make a New Feature¶
- Discuss the feature and implementation you want to add on Github before you write a PR for it.
- Features need a somewhat general usecase. If the usecase is very niche it will be hard for us to consider maintaining it.
- If you're going to add a feature consider if you could help out in the maintenance of it.
When Writing a New Feature¶
When writing a new feature there's some more
details with regard to how scikit learn likes to have its parts implemented.
We will display a sample implementation of the ColumnSelector
below. Please review all comments marked as Important.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 |
|
There's a few good practices we observe here that we'd appreciate seeing in pull requests. We want to re-use features from sklearn as much as possible.
In particular, for this example:
- We inherit from the mixins found in sklearn.
- We use the validation utils from sklearn in our object to confirm if the model is fitted, if the array going into the model is of the correct type and if the random state is appropriate.
Feel free to look at example implementations before writing your own from scratch.
Unit Tests¶
We write unit tests on these objects to make sure that they will work in a scikit-learn Pipeline.
This must be guaranteed. To facilitate this we have some standard tests that will check things like "do we change the shape of the input?".
If your transformer belongs here: feel free to add it.
Documentation¶
The documentation is generated using Material for MkDocs, its extensions and a few plugins.
In particular mkdocstrings-python
is used for API rendering.
When a new feature is introduced, it should be documented, and typically there are a few files to add or edit:
- A page in the
docs/api/
folder. - A user guide in the
docs/user-guide/
folder. - A python script in the
docs/_scripts/
folder to generate plots and code snippets (see next section) - Relevant static files, such as images, plots, tables and html's, should be saved in the
docs/_static/
folder. - Edit the
mkdocs.yaml
file to include the new pages in the navigation.
Working with pymdown snippets extension¶
The majority of code and code generate plots in the documentation is generated using the scripts in the docs/_scripts/
folder, and accessed via the pymdown snippets extension.
The reason for this separation is that:
- Markdowns are significantly easier to maintain and review than notebooks.
- Embedding code directly into markdown is simple and convenient, however it does not generate outputs.
- Instead of having duplicated (and possibly out of sync) code in markdown for rendering and in notebooks/scripts to generate outputs, via pymdown snippets extension we can bind the two together.
-
To generate the plots and/or results of a given section it is enough to run the corresponding script from the
docs/_scripts/
folder.
Info
To generate all the outputs and static files from scratch it is enough to run the following command from the root of the repository:
which will run all the scripts and save results in thedocs/_static
folder.
Render locally¶
The first step to render the documentation locally is to install the required dependencies:
Then from the root of the project, there are two options:
Info
Using mkdocs directly will allow to add extra params to the command if needed.
Then the documentation page will be available at localhost.