Python References & Coding Strategy

Iterative Development, Package Structure, and Best Practices

Brian Yandell (byandell.github.io)

2026-06-18

Python Overview & Useful Libraries

Python Overview & Core Guides

Style & Best Practices

R Integration

For multi-language workflows, refer to the guide on Integrating Python Code with R. See related environmental topics in Environmental Systems.

Earth Data Analytics (EDA)

Workbooks & Environments

Raster Processing

Use Raster Subtraction techniques to subtract one raster from another and export a new GeoTIFF in open source Python.

Useful Python Libraries

Standard & Utility Libraries

  • math, time, os, re, warnings
  • glob: Directory pathname matching.
  • csv / pathlib: File path manipulation.
  • zipfile / datetime: Compressed files and time representation.
  • scipy / sklearn (scikit-learn):
    • Machine learning utilities: .model_selection, .metrics, .cluster, .tree.

Data Science & API Libraries

  • numpy: Multi-dimensional array processing.
  • pandas: Structured dataframes.
  • seaborn / statsmodels: Statistical analysis & plotting.
  • pystac / pystac_client: Access SpatioTemporal Asset Catalogs.
  • Lists of Libraries:

Plotting & Spatial Libraries

Plotting Systems

  • matplotlib: Foundation plotting.
    • matplotlib.pyplot
    • matplotlib.colors
  • plotnine: Grammar of graphics (ggplot2 clone).
  • HoloViews / GeoViews: Interactive data exploration.
  • panel / hvplot: Dashboarding and quick plotting.

Spatial Processing

  • geopandas: GeoDataFrames.
  • shapely: Geometric operations.
  • rasterio / rasterstats: Raster manipulation and statistics.
  • xarray / rioxarray: Multi-dimensional geospatial arrays.
  • earthaccess / earthpy: NASA Earthdata access and utilities.
  • folium / cartopy: Mapping and projections.

Interactive Plots & Animations

Animated GIFs

Interactive Widgets

  • HoloViews interactive plots with gridded datasets and top-level sliders.
  • Interactive plots inside Jupyter Notebooks using %matplotlib widget and ipympl.

IPython Methods & Data Caching

IPython Interactive Methods

IPython provides powerful tools that extend Python’s standard interpreter:

  • Tab Completion: Suggests attributes and methods as you type.
  • Introspection: Use ? or ?? to inspect object properties and source code.
  • Magic Commands: Prefixed with % (e.g., %timeit to benchmark, %run to execute scripts).
  • Input/Output Caching: Retrieve outputs via _, __, ___ and inputs via _i, _ii, or In[n].
  • Rich Display: Custom renderings for HTML, images, and math via _repr_*_ methods.
  • Command History: Browsing and reusing commands across sessions.

Data Read, Write & Store Magic

Structured Data Directory

By course guidelines, save data consistently in a dedicated data directory:

def create_data_dir(new_dir):
    import os, pathlib
    data_dir = os.path.join(
        pathlib.Path.home(),
        'earth-analytics', 'data', new_dir
    )
    os.makedirs(data_dir, exist_ok=True)
    return data_dir

Store Magic (%store)

Allows user data to be persisted to disk on demand and retrieved across sessions:

# Save variable
%store buffalo_gdf

# Retrieve variable
%store -r buffalo_gdf

This is useful for persisting intermediate states in analysis notebooks.

Store Magic Retrieval Example

Retrieve stored data, or initialize and store it if it does not exist:

%store -r buffalo_gdf
try:
    buffalo_gdf
except NameError:
    import geopandas as gpd
    # Assume data_dir is defined and grassland geojson is saved
    grassland_url = f"{data_dir}/National_Grassland_Units_(Feature_Layer).geojson"
    grassland_gdf = gpd.read_file(grassland_url)
    
    # Subset and store
    buffalo_gdf = grassland_gdf.loc[grassland_gdf['GRASSLANDNAME'].isin(
        ["Buffalo Gap National Grassland", "Oglala National Grassland"])]
    %store buffalo_gdf
    print("buffalo_gdf created and stored")
else:
    print("buffalo_gdf retrieved from StoreMagic")

Cached Data via Decorators

An alternative to Store Magic is custom caching via python decorators (e.g., in landmapy/cached.py):

How @cached works:

  • Automatically pickles function results to ~/earth-analytics/data/jars/.
  • Future function calls check the jar cache first, bypassing slow computations.
  • Override cache by passing override=True to force execution.

Caching vs Store Magic

Decorators automatically handle serialization for specific functions without manual %store commands.

Decorator Mechanics

Behind the Scenes

A basic decorator wraps a function:

@decorator
def foo():
    pass
# translates to:
foo = decorator(foo)

With arguments:

@decorator_with_args(arg)
def foo():
    pass
# translates to:
foo = decorator_with_args(arg)(foo)

Advanced Module Caching

To dynamically change parameters inside a module:

def read_wbd_file(wbd_filename, huc_level,
                  cache_key, func_key='wbd_08',
                  override=False):
    @cached(func_key, override)
    def read_wbd_cached(wbd_filename, huc_level,
                        cache_key):
        ...
    return read_wbd_cached(wbd_filename, 
                           huc_level,
                           cache_key=cache_key)

Python Coding Strategy

Philosophy: Make Coding Fun

Iterative Development

  • Start with a simple, working solution.
  • Organic and iterative progression.
  • Evolve functions as understanding of patterns in data grows.
  • Revisit code often rather than designing an over-engineered grand plan initially.

Data evolve over time, and so does code, as my understanding of a project, and of the patterns in data, grows.”

Organic Start

Begin by organizing files locally before creating folders or modules.

Evolving to Functions & Docstrings

If you use code at least twice, wrap it in a function. Add a docstring to define arguments and return values:

def create_data_dir(new_dir='habitat'):
    """
    Create Data Directory if it does not exist.

    Args:
        new_dir (char, optional): Name of new directory
    Returns:
        data_dir (char): path to new directory
    """
    import os, pathlib
    data_dir = os.path.join(
        pathlib.Path.home(), 'earth-analytics', 'data', new_dir
    )
    os.makedirs(data_dir, exist_ok=True)
    return data_dir

Creating Modules & Local Imports

Step 1: Put in a file

Put the function in create_data_dir.py. Load in notebook:

%run create_data_dir.py
data_dir = create_data_dir('habitat')

Or use a standard import:

from create_data_dir import create_data_dir

Step 2: Organize in a Folder

Keep functions reusable across projects by grouping them in a folder separate from any single project:

~/Documents/GitHub/landmapy/
    landmapy/
        initial.py

Structuring a Python Package

Create a directory layout containing packaging metadata (pyproject.toml or setup.py), documentation, and modules:

landmapy/
    LICENSE         # Open-source code license
    README.md       # Package overview and function details
    pyproject.toml  # Package metadata and requirements
    landmapy/       # Main package folder
        __init__.py # Marks the directory as a package
        initial.py  # Modules containing functions
        plot.py

Installation

Once structured, install the package locally in editable mode or standard mode:

pip install ~/Documents/GitHub/landmapy

Refactoring & Backward Compatibility

Evolving Functions

  • Generalize parameter sets (e.g., expanding plot_da() to handle multiple DataArrays plot_das()).
  • Avoid mission creep. Don’t build “all-powerful” functions; focus on completing the project.

Handling Legacy Code

Provide legacy wrapper functions to avoid breaking older notebooks or scripts:

def plot_da(index_da, place, index='NDVI'):
    """Legacy function: use plot_das()."""
    return plot_das(index_da, 
                    titles=f'{place} {index}')

Versioning & GitHub Integration

Use GitHub to organize, version, and share modules.

Git Versioning

  • Commit changes incrementally.
  • Leverage versions in pyproject.toml to demarcate major changes or refactors.
  • Access earlier states using Git history if a refactor goes awry.

Remote Installation

Allow users to install your package directly from GitHub:

pip install git+https://github.com/byandell-envsys/landmapy.git

Package Documentation Strategy

Ensure a balance between thoroughness and simplicity:

  1. Docstring in Functions: Every function inside a module should have its arguments and returns documented.
  2. Module-level Docstrings: Describe the purpose of the *.py module at the top of the file.
  3. Function calls in __init__.py: Provide direct imports at the package level for clean access.
  4. README topic blocks: Structure function summaries inside expandable (dropdown) details blocks in README.md.

Migrating Notebooks to Quarto

Why Quarto?

  • Flat files: *.qmd files are plain text Markdown, making them small and easy to version control in Git.
  • Flexible output: Compiles seamlessly to HTML, PDF, Word, or interactive slide decks.
  • Executable code: Integrates code chunks from Python, R, and Julia in a single document.

Format Conversion

Easily convert between Jupyter and Quarto formats using the Quarto CLI:

# Jupyter -> Quarto
quarto convert project.ipynb

# Quarto -> Jupyter
quarto convert project.qmd