# Getting Started

## Base Scenario¶

Let's say you're running a simulation, or maybe a machine learning experiment. Then you might have code that looks like this;

``````import numpy as np

def birthday_experiment(class_size, n_sim=10_000):
"""Simulates the birthday paradox. Vectorized = Fast!"""
sims = np.random.randint(1, 365 + 1, (n_sim, class_size))
sort_sims = np.sort(sims, axis=1)
n_uniq = (sort_sims[:, 1:] != sort_sims[:, :-1]).sum(axis = 1) + 1
return np.mean(n_uniq != class_size)

results = [birthday_experiment(s) for s in range(2, 40)]
``````

This example sort of works, but how would we now go about plotting our results? If you want to plot the effect of `class_size` and the simulated probability then it'd be do-able. But things get tricky if you're also interested in seeing the effect of `n_sim` as well. The input of the simulation isn't nicely captured together with the output of the simulation.

## Decorators¶

The idea behind this library is that you can rewrite this function, only slightly, to make all of this data collection a whole log simpler.

``````import numpy as np
from memo import memlist

data = []

@memlist(data=data)
def birthday_experiment(class_size, n_sim):
"""Simulates the birthday paradox. Vectorized = Fast!"""
sims = np.random.randint(1, 365 + 1, (n_sim, class_size))
sort_sims = np.sort(sims, axis=1)
n_uniq = (sort_sims[:, 1:] != sort_sims[:, :-1]).sum(axis = 1) + 1
return {"est_proba": np.mean(n_uniq != class_size)}

for size in range(2, 40):
for n_sim in [1000, 10000, 100000]:
birthday_experiment(class_size=size, n_sim=n_sim)
``````

The `data` object now represents a list of dictionaries that have `"n_sim"`, `"class_size"` and `"est_proba"` as keys. You can easily turn these into a pandas DataFrame if you'd like via `pd.DataFrame(data)`.

## Logging More¶

The `memlist` decorate takes care of all data collection. It captures all keyword arguments of the function as well as the dictionary output of the function. This then is appended this to a list `data`. Especially when you're iteration on your experiments this might turn out to be a lovely pattern.

For example, suppose we also want to log how long the simulation takes;

``````import time
import numpy as np
from memo import memlist

data = []

@memlist(data=data)
def birthday_experiment(class_size, n_sim):
"""Simulates the birthday paradox. Vectorized = Fast!"""
t1 = time.time()
sims = np.random.randint(1, 365 + 1, (n_sim, class_size))
sort_sims = np.sort(sims, axis=1)
n_uniq = (sort_sims[:, 1:] != sort_sims[:, :-1]).sum(axis = 1) + 1
proba = np.mean(n_uniq != class_size)
t2 = time.time()
return {"est_proba": proba, "time": t2 - t1}

for size in range(2, 40):
for n_sim in [1000, 10000, 100000]:
birthday_experiment(class_size=size, n_sim=n_sim)
``````

If you now inspect `data` you'll notice it also contains the `"time"` information. Note though that there's an easier method to log the time, you can use the `@time_taken` decorator that this library supplies.

## Power¶

The real power of the library is that you can choose not only to log to a list. You can just as easily write to a file too!

``````import time
import numpy as np
from memo import memlist, memfile

data = []

@memfile(filepath="results.json")
@memlist(data=data)
def birthday_experiment(class_size, n_sim):
"""Simulates the birthday paradox. Vectorized = Fast!"""
t1 = time.time()
sims = np.random.randint(1, 365 + 1, (n_sim, class_size))
sort_sims = np.sort(sims, axis=1)
n_uniq = (sort_sims[:, 1:] != sort_sims[:, :-1]).sum(axis = 1) + 1
proba = np.mean(n_uniq != class_size)
t2 = time.time()
return {"est_proba": proba, "time": t2 - t1}

for size in range(2, 40):
for n_sim in [1000, 10000, 100000]:
birthday_experiment(class_size=size, n_sim=n_sim)
``````

## Utilities¶

The library also offers utilities to make the creation of these grids even easier. In particular;

• We supply a grid generation mechanism to prevent a lot of for-loops.
• We supply a `@time_taken` so that you don't need to write that logic yourself.
``````import numpy as np
from memo import memlist, memfile, grid, time_taken

data = []

@memfile(filepath="results.json")
@memlist(data=data)
@time_taken()
def birthday_experiment(class_size, n_sim):
"""Simulates the birthday paradox. Vectorized = Fast!"""
sims = np.random.randint(1, 365 + 1, (n_sim, class_size))
sort_sims = np.sort(sims, axis=1)
n_uniq = (sort_sims[:, 1:] != sort_sims[:, :-1]).sum(axis = 1) + 1
proba = np.mean(n_uniq != class_size)
return {"est_proba": proba}

for settings in grid(class_size=range(2, 40), n_sim=[1000, 10000, 100000]):
birthday_experiment(**settings)
``````

## Parallel¶

If you have a lot of simulations you'd like to run, it might be helpful to run them in parallel. That's why this library also hosts a `Runner` class that can run your functions on multiple CPU cores.

``````import numpy as np

from memo import memlist, memfile, grid, time_taken, Runner

data = []

@memfile(filepath="results.jsonl")
@memlist(data=data)
@time_taken()
def birthday_experiment(class_size, n_sim):
"""Simulates the birthday paradox. Vectorized = Fast!"""
sims = np.random.randint(1, 365 + 1, (n_sim, class_size))
sort_sims = np.sort(sims, axis=1)
n_uniq = (sort_sims[:, 1:] != sort_sims[:, :-1]).sum(axis=1) + 1
proba = np.mean(n_uniq != class_size)
return {"est_proba": proba}

settings = grid(class_size=range(20, 30), n_sim=[100, 10_000, 1_000_000])

# To Run in parallel
runner = Runner(backend="threading", n_jobs=1)
runner.run(func=birthday_experiment, settings=settings)
``````

## More features¶

These decorators aren't performing magic, but my experience has been that these decorators make it more fun to actually log the results of experiments. It's nice to be able to just add a decorator to a function and not have to worry about logging the statistics.

The library also offers extra features to make things a whole log simpler.

• `memweb` sends the json blobs to a server via http-post requests
• `memfunc` sends the data to a callable that you supply, like `print`
• `random_grid` generates a randomized grid for your experiments