Notes and learnings from the Python bootcamp
06-04-2022
Introduction
Pandera
- Pandera models usually used within the ingestion phase, with less concerns around perfornance as this is once off
- we use this during the processing phase, which may add processing overheads
GIL
- python performance consideration
Compilation
- Not compiled to machine code - python byte code, which is an intermediate stage and is then interpreted
- int is not 8-byte, represeted as 24-byte object
- flexible containers with arbitrary object types (non-homogenoues), inefficient memory access
Data Locality
- local data is important consideration for performance
- bringing data closer helps a lot
Profiling Tools
Line Profiler
line_profiler
provides line-by-line profiling- use
kernprof -lv <python module/script>
from the command line and add decorators with@profile
to relevant functions you want to profile
- use
Py-Spy
- attach profiler to already-running process
py-spy record --pid <PID>
py-spy top --pid <PID>
py-spy dump --pid <PID>
Numba
- numeric processing only, with numpy, scikit-learn
- not for string values
- just-in-time compiler, first run will be slow and the speed up
2022-04-07
Numpy Example
Using Numpy vs Plain Python
import numpy as np
deltas = np.random.normal(size=(1_000_000, 2))
def fn1(deltas):
return np.sqrt(np.power(deltas, 2).sum(axis=1))
def fn2(deltas):
return np.sqrt((deltas**2).sum(axis=1))
def fn3(deltas):
return ((deltas**2).sum(axis=1))**0.5
%timeit fn1(deltas)
Using numba
from numba import njit
@njit()
def fn1(deltas):
return np.sqrt(np.power(deltas, 2).sum(axis=1))
@njit()
def fn2(deltas):
return np.sqrt((deltas**2).sum(axis=1))
@njit()
def fn2_sum(deltas):
return np.sqrt((deltas[:,0]**2) + (deltas[:,1]**2))
@njit(parallel=True)
def fn2_sum_parallel(deltas):
return np.sqrt((deltas[:,0]**2) + (deltas[:,1]**2))
@njit()
def fn3(deltas):
return ((deltas**2).sum(axis=1))**0.5
JobLib
- used for embarringlessly parallel tasks for CPU-bound tasks
results = Parallel(n_jobs=4, verbose=2)(delayed(simulate)(cfg,) for n in range(nbr))
- passes generator expression to the class instance, efficient way to run function while being memory efficient and not initialising each instance
- runs synchronously, just want to get jobs done and don't want to worry about async or state
Memory Profiling
- complications from profiling
- lists will tell you the memory usage of the references of the elements in the list but not the memory usage of the the objects that the references point to
Usage
- step into debugger at point where memory reaches certain threshold
python -m memory_profiler --pdb-mmem=100 my_script.py
- decorators
@profile
def func():
a = ...
b = ...
del b
return a