Notes and learnings from the Python bootcamp

06-04-2022

Introduction

Pandera

  • Pandera models usually used within the ingestion phase, with less concerns around perfornance as this is once off
    • we use this during the processing phase, which may add processing overheads

GIL

  • python performance consideration

Compilation

  • Not compiled to machine code - python byte code, which is an intermediate stage and is then interpreted
  • int is not 8-byte, represeted as 24-byte object
  • flexible containers with arbitrary object types (non-homogenoues), inefficient memory access

Data Locality

  • local data is important consideration for performance
  • bringing data closer helps a lot

Profiling Tools

Line Profiler

  • line_profiler provides line-by-line profiling
    • use kernprof -lv <python module/script> from the command line and add decorators with @profile to relevant functions you want to profile

Py-Spy

  • attach profiler to already-running process
py-spy record --pid <PID>
py-spy top --pid <PID>
py-spy dump --pid <PID>

Numba

  • numeric processing only, with numpy, scikit-learn
  • not for string values
  • just-in-time compiler, first run will be slow and the speed up

2022-04-07

Numpy Example

Using Numpy vs Plain Python

import numpy as np
 
 
deltas = np.random.normal(size=(1_000_000, 2))
 
def fn1(deltas):
    return np.sqrt(np.power(deltas, 2).sum(axis=1))
 
def fn2(deltas):
    return np.sqrt((deltas**2).sum(axis=1))
 
def fn3(deltas):
    return ((deltas**2).sum(axis=1))**0.5
 
 
%timeit fn1(deltas)

Using numba

from numba import njit
 
@njit()
def fn1(deltas):
    return np.sqrt(np.power(deltas, 2).sum(axis=1))
 
@njit()
def fn2(deltas):
    return np.sqrt((deltas**2).sum(axis=1))
 
@njit()
def fn2_sum(deltas):
    return np.sqrt((deltas[:,0]**2) + (deltas[:,1]**2))
 
@njit(parallel=True)
def fn2_sum_parallel(deltas):
    return np.sqrt((deltas[:,0]**2) + (deltas[:,1]**2))
 
@njit()
def fn3(deltas):
    return ((deltas**2).sum(axis=1))**0.5

JobLib

  • used for embarringlessly parallel tasks for CPU-bound tasks
results = Parallel(n_jobs=4, verbose=2)(delayed(simulate)(cfg,) for n in range(nbr))
  • passes generator expression to the class instance, efficient way to run function while being memory efficient and not initialising each instance
  • runs synchronously, just want to get jobs done and don't want to worry about async or state

Memory Profiling

  • complications from profiling
  • lists will tell you the memory usage of the references of the elements in the list but not the memory usage of the the objects that the references point to

Usage

  • step into debugger at point where memory reaches certain threshold
python -m memory_profiler --pdb-mmem=100 my_script.py
  • decorators
@profile
def func():
  a = ...
  b = ...
  del b
  return a