Iterative Development, Package Structure, and Best Practices
2026-06-18
R Integration
For multi-language workflows, refer to the guide on Integrating Python Code with R. See related environmental topics in Environmental Systems.
Raster Processing
Use Raster Subtraction techniques to subtract one raster from another and export a new GeoTIFF in open source Python.
math, time, os, re, warningsglob: Directory pathname matching.csv / pathlib: File path manipulation.zipfile / datetime: Compressed files and time representation.scipy / sklearn (scikit-learn):
.model_selection, .metrics, .cluster, .tree.numpy: Multi-dimensional array processing.pandas: Structured dataframes.seaborn / statsmodels: Statistical analysis & plotting.pystac / pystac_client: Access SpatioTemporal Asset Catalogs.matplotlib: Foundation plotting.
matplotlib.pyplotmatplotlib.colorsplotnine: Grammar of graphics (ggplot2 clone).HoloViews / GeoViews: Interactive data exploration.panel / hvplot: Dashboarding and quick plotting.geopandas: GeoDataFrames.shapely: Geometric operations.rasterio / rasterstats: Raster manipulation and statistics.xarray / rioxarray: Multi-dimensional geospatial arrays.earthaccess / earthpy: NASA Earthdata access and utilities.folium / cartopy: Mapping and projections.Image module.%matplotlib widget and ipympl.IPython provides powerful tools that extend Python’s standard interpreter:
? or ?? to inspect object properties and source code.% (e.g., %timeit to benchmark, %run to execute scripts)._, __, ___ and inputs via _i, _ii, or In[n]._repr_*_ methods.By course guidelines, save data consistently in a dedicated data directory:
Retrieve stored data, or initialize and store it if it does not exist:
%store -r buffalo_gdf
try:
buffalo_gdf
except NameError:
import geopandas as gpd
# Assume data_dir is defined and grassland geojson is saved
grassland_url = f"{data_dir}/National_Grassland_Units_(Feature_Layer).geojson"
grassland_gdf = gpd.read_file(grassland_url)
# Subset and store
buffalo_gdf = grassland_gdf.loc[grassland_gdf['GRASSLANDNAME'].isin(
["Buffalo Gap National Grassland", "Oglala National Grassland"])]
%store buffalo_gdf
print("buffalo_gdf created and stored")
else:
print("buffalo_gdf retrieved from StoreMagic")An alternative to Store Magic is custom caching via python decorators (e.g., in landmapy/cached.py):
@cached works:~/earth-analytics/data/jars/.override=True to force execution.Caching vs Store Magic
Decorators automatically handle serialization for specific functions without manual %store commands.
A basic decorator wraps a function:
With arguments:
To dynamically change parameters inside a module:
“Data evolve over time, and so does code, as my understanding of a project, and of the patterns in data, grows.”
Organic Start
Begin by organizing files locally before creating folders or modules.
If you use code at least twice, wrap it in a function. Add a docstring to define arguments and return values:
def create_data_dir(new_dir='habitat'):
"""
Create Data Directory if it does not exist.
Args:
new_dir (char, optional): Name of new directory
Returns:
data_dir (char): path to new directory
"""
import os, pathlib
data_dir = os.path.join(
pathlib.Path.home(), 'earth-analytics', 'data', new_dir
)
os.makedirs(data_dir, exist_ok=True)
return data_dirPut the function in create_data_dir.py. Load in notebook:
Or use a standard import:
Create a directory layout containing packaging metadata (pyproject.toml or setup.py), documentation, and modules:
landmapy/
LICENSE # Open-source code license
README.md # Package overview and function details
pyproject.toml # Package metadata and requirements
landmapy/ # Main package folder
__init__.py # Marks the directory as a package
initial.py # Modules containing functions
plot.pyOnce structured, install the package locally in editable mode or standard mode:
plot_da() to handle multiple DataArrays plot_das()).Use GitHub to organize, version, and share modules.
pyproject.toml to demarcate major changes or refactors.Allow users to install your package directly from GitHub:
Ensure a balance between thoroughness and simplicity:
*.py module at the top of the file.__init__.py: Provide direct imports at the package level for clean access.README.md.*.qmd files are plain text Markdown, making them small and easy to version control in Git.Python References & Coding Strategy