And only 24.1% of them actually ran.
I stumbled apon a very interesting paper the other day.
The paper is titled “A Large-scale Study about Quality and Reproducibility of Jupyter Notebooks” by Joao Felipe Pimentel, Leonardo Murta, Vanessa Braganholo and Juliana Freire.
The paper was described an effort of running 1.4 million Jupyter notebooks found on GitHub to study the reproducibility. The main result: only 24.1% of the notebooks were able to run and only 4% of all notebook runs yield the same results. To quote the paper:
The most common causes of failures were related to missing dependencies, the presence of hidden states and out-of-order executions, and data accessibility.
The paper also lists some general recommendations based on the findings, all of which ring true to me.
If you want to follow the original authors you can find them on twitter: @joaofelipenp, @leomurta, @vanbraganholo, @jfreirenet.