## Introduction

### Pandera

- Pandera models usually used within the ingestion phase, with less concerns around perfornance as this is once off
  - we use this during the processing phase, which may add processing overheads

### GIL

- python performance consideration

### Compilation

- Not compiled to machine code - python byte code, which is an intermediate stage and is then interpreted
- int is not 8-byte, represeted as 24-byte object
- flexible containers with arbitrary object types (non-homogenoues), inefficient memory access

### Data Locality

- local data is important consideration for performance
- bringing data closer helps a lot

### Profiling Tools

#### Line Profiler

- `line_profiler` provides line-by-line profiling
  - use `kernprof -lv <python module/script>` from the command line and add decorators with `@profile` to relevant functions you want to profile

#### Py-Spy

- attach profiler to already-running process

```shell
py-spy record --pid <PID>
py-spy top --pid <PID>
py-spy dump --pid <PID>
```

### Numba

- numeric processing only, with numpy, scikit-learn
- not for string values
- just-in-time compiler, first run will be slow and the speed up

## 2022-04-07

### Numpy Example

#### Using Numpy vs Plain Python

```python
import numpy as np


deltas = np.random.normal(size=(1_000_000, 2))

def fn1(deltas):
    return np.sqrt(np.power(deltas, 2).sum(axis=1))

def fn2(deltas):
    return np.sqrt((deltas**2).sum(axis=1))

def fn3(deltas):
    return ((deltas**2).sum(axis=1))**0.5


%timeit fn1(deltas)
```

#### Using numba

```python
from numba import njit

@njit()
def fn1(deltas):
    return np.sqrt(np.power(deltas, 2).sum(axis=1))

@njit()
def fn2(deltas):
    return np.sqrt((deltas**2).sum(axis=1))

@njit()
def fn2_sum(deltas):
    return np.sqrt((deltas[:,0]**2) + (deltas[:,1]**2))

@njit(parallel=True)
def fn2_sum_parallel(deltas):
    return np.sqrt((deltas[:,0]**2) + (deltas[:,1]**2))

@njit()
def fn3(deltas):
    return ((deltas**2).sum(axis=1))**0.5
```

### JobLib

- used for embarringlessly parallel tasks for CPU-bound tasks

```python
results = Parallel(n_jobs=4, verbose=2)(delayed(simulate)(cfg,) for n in range(nbr))
```

- passes generator expression to the class instance, efficient way to run function while being memory efficient and not initialising each instance
- runs synchronously, just want to get jobs done and don't want to worry about async or state

### Memory Profiling

- complications from profiling
- lists will tell you the memory usage of the references of the elements in the list but not the memory usage of the the objects that the references point to

#### Usage

- step into debugger at point where memory reaches certain threshold

```python
python -m memory_profiler --pdb-mmem=100 my_script.py
```

- decorators

```python
@profile
def func():
  a = ...
  b = ...
  del b
  return a
```

Notes and learnings from the Python bootcamp

Introduction

Pandera

GIL

Compilation

Data Locality

Profiling Tools

Line Profiler

Py-Spy

Numba

2022-04-07

Numpy Example

Using Numpy vs Plain Python

Using numba

JobLib

Memory Profiling

Usage