# Getting Started

## Base Scenario¶

Let's say you're running a simulation, or maybe a machine learning experiment. Then you might have code that looks like this;

``````import numpy as np

def birthday_experiment(class_size, n_sim=10_000):
"""Simulates the birthday paradox. Vectorized = Fast!"""
sims = np.random.randint(1, 365 + 1, (n_sim, class_size))
sort_sims = np.sort(sims, axis=1)
n_uniq = (sort_sims[:, 1:] != sort_sims[:, :-1]).sum(axis = 1) + 1
return np.mean(n_uniq != class_size)

results = [birthday_experiment(s) for s in range(2, 40)]
``````

This example sort of works, but how would we now go about plotting our results? If you want to plot the effect of `class_size` and the simulated probability then it'd be do-able. But things get tricky if you're also interested in seeing the effect of `n_sim` as well. The input of the simulation isn't nicely captured together with the output of the simulation.

## Decorators¶

The idea behind this library is that you can rewrite this function, only slightly, to make all of this data collection a whole log simpler.

``````import numpy as np
from memo import memlist

data = []

@memlist(data=data)
def birthday_experiment(class_size, n_sim):
"""Simulates the birthday paradox. Vectorized = Fast!"""
sims = np.random.randint(1, 365 + 1, (n_sim, class_size))
sort_sims = np.sort(sims, axis=1)
n_uniq = (sort_sims[:, 1:] != sort_sims[:, :-1]).sum(axis = 1) + 1
return {"est_proba": np.mean(n_uniq != class_size)}

for size in range(2, 40):
for n_sim in [1000, 10000, 100000]:
birthday_experiment(class_size=size, n_sim=n_sim)
``````

The `data` object now represents a list of dictionaries that have `"n_sim"`, `"class_size"` and `"est_proba"` as keys. You can easily turn these into a pandas DataFrame if you'd like via `pd.DataFrame(data)`.

## Logging More¶

The `memlist` decorate takes care of all data collection. It captures all keyword arguments of the function as well as the dictionary output of the function. This then is appended this to a list `data`. Especially when you're iteration on your experiments this might turn out to be a lovely pattern.

For example, suppose we also want to log how long the simulation takes;

``````import time
import numpy as np
from memo import memlist

data = []

@memlist(data=data)
def birthday_experiment(class_size, n_sim):
"""Simulates the birthday paradox. Vectorized = Fast!"""
t1 = time.time()
sims = np.random.randint(1, 365 + 1, (n_sim, class_size))
sort_sims = np.sort(sims, axis=1)
n_uniq = (sort_sims[:, 1:] != sort_sims[:, :-1]).sum(axis = 1) + 1
proba = np.mean(n_uniq != class_size)
t2 = time.time()
return {"est_proba": proba, "time": t2 - t1}

for size in range(2, 40):
for n_sim in [1000, 10000, 100000]:
birthday_experiment(class_size=size, n_sim=n_sim)
``````

If you now inspect `data` you'll notice it also contains the `"time"` information. Note though that there's an easier method to log the time, you can use the `@time_taken` decorator that this library supplies.

## Power¶

The real power of the library is that you can choose not only to log to a list. You can just as easily write to a file too!

``````import time
import numpy as np
from memo import memlist, memfile

data = []

@memfile(filepath="results.json")
@memlist(data=data)
def birthday_experiment(class_size, n_sim):
"""Simulates the birthday paradox. Vectorized = Fast!"""
t1 = time.time()
sims = np.random.randint(1, 365 + 1, (n_sim, class_size))
sort_sims = np.sort(sims, axis=1)
n_uniq = (sort_sims[:, 1:] != sort_sims[:, :-1]).sum(axis = 1) + 1
proba = np.mean(n_uniq != class_size)
t2 = time.time()
return {"est_proba": proba, "time": t2 - t1}

for size in range(2, 40):
for n_sim in [1000, 10000, 100000]:
birthday_experiment(class_size=size, n_sim=n_sim)
``````

## Utilities¶

The library also offers utilities to make the creation of these grids even easier. In particular;

• We supply a grid generation mechanism to prevent a lot of for-loops.
• We supply a `@time_taken` so that you don't need to write that logic yourself.
``````import numpy as np
from memo import memlist, memfile, grid, time_taken

data = []

@memfile(filepath="results.json")
@memlist(data=data)
@time_taken()
def birthday_experiment(class_size, n_sim):
"""Simulates the birthday paradox. Vectorized = Fast!"""
sims = np.random.randint(1, 365 + 1, (n_sim, class_size))
sort_sims = np.sort(sims, axis=1)
n_uniq = (sort_sims[:, 1:] != sort_sims[:, :-1]).sum(axis = 1) + 1
proba = np.mean(n_uniq != class_size)
return {"est_proba": proba}

for settings in grid(class_size=range(2, 40), n_sim=[1000, 10000, 100000]):
birthday_experiment(**settings)
``````

## Parallel¶

If you have a lot of simulations you'd like to run, it might be helpful to run them in parallel. That's why this library also hosts a `Runner` class that can run your functions on multiple CPU cores.

``````import numpy as np

from memo import memlist, memfile, grid, time_taken, Runner

data = []

@memfile(filepath="results.jsonl")
@memlist(data=data)
@time_taken()
def birthday_experiment(class_size, n_sim):
"""Simulates the birthday paradox. Vectorized = Fast!"""
sims = np.random.randint(1, 365 + 1, (n_sim, class_size))
sort_sims = np.sort(sims, axis=1)
n_uniq = (sort_sims[:, 1:] != sort_sims[:, :-1]).sum(axis=1) + 1
proba = np.mean(n_uniq != class_size)
return {"est_proba": proba}

settings = grid(class_size=range(20, 30), n_sim=[100, 10_000, 1_000_000])

# To Run in parallel
• `memweb` sends the json blobs to a server via http-post requests
• `memfunc` sends the data to a callable that you supply, like `print`
• `random_grid` generates a randomized grid for your experiments