# About Groups

If a group is present on a `Clumper`

then the behavior
of some of the verbs will change. This guide will explain
what changes you can expect and why they are useful.

## What is a Group?¶

You can add a group to a `Clumper`

by calling `.group_by()`

.

```
from clumper import Clumper
list_dicts = [
{'a': 6, 'grp': 'a'},
{'a': 2, 'grp': 'b'},
{'a': 7, 'grp': 'a'},
{'a': 4, 'grp': 'b'},
{'a': 5, 'grp': 'a'}
]
(Clumper(list_dicts)
.group_by('grp'))
```

The current group is now looking at all the items that have `grp`

as a key.

This means that the collection is now aware that you're interested
in calculating things per group. In this case you'd get two groups. One for
`{'grp': 'a'}`

and one for `{'grp': 'b'}`

.

There are some verbs that will behave differently because of this.

### Agg¶

#### Without Groups¶

When you don't have a group active then we'll make a single summary for the entire collection of items.

```
from clumper import Clumper
list_dicts = [
{'a': 6, 'grp': 'a'},
{'a': 2, 'grp': 'b'},
{'a': 7, 'grp': 'a'},
{'a': 4, 'grp': 'b'},
{'a': 5, 'grp': 'a'}
]
(Clumper(list_dicts)
.agg(s=('a', 'sum'),
m=('a', 'mean'))
.collect())
```

#### With Groups¶

When there is a group active then we'll make a summary per group. We'll also ensure that the keys of the relevant groups are made available in the new collection.

Note that the `group`

is still active!

```
from clumper import Clumper
list_dicts = [
{'a': 6, 'grp': 'a'},
{'a': 2, 'grp': 'b'},
{'a': 7, 'grp': 'a'},
{'a': 4, 'grp': 'b'},
{'a': 5, 'grp': 'a'}
]
(Clumper(list_dicts)
.group_by('grp')
.agg(s=('a', 'sum'),
m=('a', 'mean'))
.collect())
```

#### Aggergators¶

You can use your own functions if you want to do aggregation but we offer a few standard ones. Here's the standard mapping.

```
{
"mean": mean,
"count": lambda d: len(d),
"unique": lambda d: list(set(d)),
"n_unique": lambda d: len(set(d)),
"sum": sum,
"min": min,
"max": max,
"median": median,
"var": variance,
"std": stdev,
"values": lambda d: d,
"first": lambda d: d[0],
"last": lambda d: d[-1],
}
```

### Transform¶

The `.transform()`

verb is similar to the `.agg()`

verb. The main difference is
that it does not reduce any rows/keys during aggregation. Instead they are merged
back in with the original collection. The examples below should help explain what
the usecase is.

#### Without Groups¶

With no groups active we just attach the same summary to every item.

```
from clumper import Clumper
data = [
{"a": 6, "grp": "a"},
{"a": 2, "grp": "b"},
{"a": 7, "grp": "a"},
{"a": 9, "grp": "b"},
{"a": 5, "grp": "a"}
]
tfm_data = (Clumper(data)
.group_by("grp")
.transform(s=("a", "sum"),
u=("a", "unique"))
.collect())
```

#### With Groups¶

With groups active we calculate a summary per group and only attach the relevant summary to each item.

```
from clumper import Clumper
data = [
{"a": 6, "grp": "a"},
{"a": 2, "grp": "b"},
{"a": 7, "grp": "a"},
{"a": 9, "grp": "b"},
{"a": 5, "grp": "a"}
]
tfm_data = (Clumper(data)
.group_by("grp")
.transform(s=("a", "sum"),
u=("a", "unique"))
.collect())
```

### Mutate¶

This library offers stateful functions like `row_number`

. If you use
these functions while there is a group active you'll notice different
behavior.

#### Without Groups¶

When there is no group we just start counting at one and we continue counting until we're at the end of the collection.

```
from clumper import Clumper
from clumper.sequence import row_number
list_dicts = [
{'a': 6, 'grp': 'a'},
{'a': 2, 'grp': 'b'},
{'a': 7, 'grp': 'a'},
{'a': 4, 'grp': 'b'},
{'a': 5, 'grp': 'a'}
]
(Clumper(list_dicts)
.mutate(r=row_number())
.collect())
```

#### With Groups¶

Because there are groups you'll notice that the order
is different but also that the `row_number`

resets when
seeing the new group.

```
from clumper import Clumper
from clumper.sequence import row_number
list_dicts = [
{'a': 6, 'grp': 'a'},
{'a': 2, 'grp': 'b'},
{'a': 7, 'grp': 'a'},
{'a': 4, 'grp': 'b'},
{'a': 5, 'grp': 'a'}
]
(Clumper(list_dicts)
.group_by('grp')
.mutate(r=row_number())
.collect())
```

### Sort¶

#### Without Groups¶

With no groups active, we just sort the entire collection
based on the `key`

that is provided.

```
from clumper import Clumper
list_dicts = [
{'a': 6, 'grp': 'a'},
{'a': 2, 'grp': 'b'},
{'a': 7, 'grp': 'a'},
{'a': 9, 'grp': 'b'},
{'a': 5, 'grp': 'a'}
]
(Clumper(list_dicts)
.sort(key=lambda d: d['a'])
.collect())
```

#### With Groups¶

With groups active, still perform the sorting but only within each group.

```
from clumper import Clumper
list_dicts = [
{'a': 6, 'grp': 'a'},
{'a': 2, 'grp': 'b'},
{'a': 7, 'grp': 'a'},
{'a': 9, 'grp': 'b'},
{'a': 5, 'grp': 'a'}
]
(Clumper(list_dicts)
.group_by('grp')
.sort(key=lambda d: d['a'])
.collect())
```

## Ungroup¶

If you're done with a group and you'd like to move on you can drop all
groups by calling `.ungroup()`

.