Python References & Coding Strategy

This is a collection of Python references that I have found useful. These were originally in the landmapy repository. See related in Environmental Systems. Please offer suggestions to improve.

byandell.github.io/Documentation

Python Overview

Earth Data Analytics (EDA) Workbook

Useful Python Libraries

Lists of Python Libraries

Plot Libraries and Systems

See landmapy Plot Functions.

Spatial Libraries

Interactive Plots

IPython Methods

AI overview: IPython methods enhance interactive computing in Python, offering features beyond the standard interpreter. Some key methods include:

  • Tab Completion: Simplifies code writing by suggesting attributes and methods of objects or modules as you type.
  • Introspection: Provides detailed information about objects, functions, or modules using ? or ??.
  • Magic Commands: Special commands prefixed with % for tasks like timing code execution (%timeit), running external scripts (%run), or accessing shell commands (!).
  • Input Caching: Stores previous commands and outputs, accessible via _, __, ___ for outputs and _i, _ii, _iii or In[n] for inputs.
  • Rich Display: Enables richer object representations using _ipython_display_() or _repr_*_() methods for custom display formats like HTML or images.
  • History: Allows browsing and reusing previous commands across sessions. These methods streamline development, debugging, and exploration in interactive Python environments.

References:

Data

Data can be explicitly stored in a file using read and write methods, or implicitly using the pickle module. Store Magic and caching are two ways to store data using pickle.

Read and Write

Many projects read and write to files. Following course guidelines, we use the data_dir variable to store data in a consistent location, which for EDA is ~/earth-analytics/data. The landmapy/initial.py has function create_data_dir() to create a directory if it does not exist.

def create_data_dir(new_dir):
    import os
    import pathlib

    data_dir = os.path.join(
        pathlib.Path.home(),
        'earth-analytics',
        'data',
        new_dir
    )
    os.makedirs(data_dir, exist_ok=True)

    return data_dir

Store Magic Data

Store Magic stores user data on demand via the magic command %store blah in a named file in ~/.ipython/profile_default/db/autorestore/, and retrieve it with %store -r blah. This is useful for storing data between sessions or projects.

The following code will try to retrieve the object buffalo_gdf if it was previously stored. The try statement checks if buffalo_gdf exists, creating a NameError exception if not, which leads to code to create and %store it.

%store -r buffalo_gdf
try:
    buffalo_gdf
except NameError:
    import geopandas as gpd
    # Assume `data_dir` is defined and `geojson` file is saved there.
    # Read all grasslands GeoJSON into `grassland_gdf`.
    grassland_url = f"{data_dir}/National_Grassland_Units_(Feature_Layer).geojson"
    grassland_gdf = gpd.read_file(grassland_url)
    # Subset to desired locations.
    buffalo_gdf = grassland_gdf.loc[grassland_gdf['GRASSLANDNAME'].isin(
        ["Buffalo Gap National Grassland", "Oglala National Grassland"])]
    %store buffalo_gdf
    print("buffalo_gdf created and stored")
else:
    print("buffalo_gdf retrieved from StoreMagic")

Cached Data via Decorator

The landmapy/cached.py decorator caches data in the jars directory ~/earth-analytics/data/jars/. The decorator @cached is used to cache the results of a function. See examples in clustering.qmd using functions in the landmapy/reflect.py module. Some explanation of decorators is in the next section. There is no need to use Store Magic with this decorator, as it already caches the data in the jars directory.

Decorators

Code for a caching decorator is in landmapy/cached.py, which you can use in your code. This decorator will pickle the results of running a do_something() function, and only run the code if the results do not already exist. To override the caching, for example temporarily after making changes to your code, set override=True. Note that to use the caching decorator, you must write your own function to perform each task. See examples in landmapy/delta.py and landmapy/reflectance.py.

One way of thinking about decorators with arguments is

@decorator
def foo(*args, **kwargs):
    pass

translates to

foo = decorator(foo)

So if the decorator had arguments,

@decorator_with_args(arg)
def foo(*args, **kwargs):
    pass

translates to

foo = decorator_with_args(arg)(foo)

That is, decorator_with_args() is a function which accepts a custom argument and which returns the actual decorator (that will be applied to the decorated function).

A decorator with arguments can be used in a notebook or document. However, in order to embed the arguments within a module takes a bit more care. For instance, landmapy/reflect.py uses the @cached decorator from landmapy/cached.py to cache the results of the function. The original static use of the decorator was

from landmapy.cached import cached

@cached('wbd_08')
def read_wbd_file(wbd_filename, huc_level, cache_key):
    ...
def read_delta_gdf(huc_level=12, huc_region='08', watershed='080902030506'):
    wbd_gdf = read_wbd_file(
        f"WBD_{huc_region}_HU2_Shape", huc_level, cache_key=f'hu{huc_level}')
    ...

Note the keyword argument cache_key is used in the function read_delta_gdf() when calling the decorated function read_wbd_file(), with data cached in the jars directory as f'wbd_08_hu{huc_level}.pickle', with the HUC level 12 changeable, but not the HUC region. To make this more flexible, the code was changed as follows:

from landmapy.cached import cached

def read_wbd_file(wbd_filename, huc_level, cache_key,
                  func_key='wbd_08', override=False):
    @cached(func_key, override)
    def read_wbd_cached(wbd_filename, huc_level, cache_key):
    ...
    wbd_gdf = read_wbd_cached(wbd_filename, huc_level, cache_key=cache_key)
    return wbd_gdf
def read_delta_gdf(huc_level=12, huc_region='08', watershed='080902030506',
                   func_key='wbd_08', override=False):
    wbd_gdf = read_wbd_file(
        f"WBD_{huc_region}_HU2_Shape", huc_level,
        cache_key=f'hu{huc_level}',
        func_key=func_key, override=override)
    ...

The revised function read_delta_gdf() has added arguments for the @cached decorator func_key and override. In addition, read_wbd_file() is now an undecorated function that calls the internal decorated function read_wbd_cached(). The decorator @cached is now inside the function read_wbd_file(), called with arguments func_key and override. The keyword argument cache_key is still used in the function read_delta_gdf() and importantly in the call to the decorated function read_wbd_cached() from within read_wbd_file(). Data are now cached in the jars directory as 'f{func_key}_hu{huc_level}.pickle', which changes with the HUC level 12 and HUC region.

Classes

A class is a function with output of an object that has new methods, which are in turn functions defined in the class. In addition, the @property decorator defines attributes for an object. The main uses of classes are to:

  • add functionality to an existing class
  • streamline different functions with the same parameters to keep track of metadata

AI overview: In Python, a class serves as a blueprint for creating objects, which are instances that encapsulate data (attributes) and behavior (methods). Classes facilitate object-oriented programming (OOP) principles, enabling code reusability, modularity, and organization. A class is defined using the class keyword, followed by the class name and a colon. Inside the class block, attributes and methods are defined. The init method is a special method, known as the constructor, which is automatically called when an object of the class is created to initialize the object’s attributes.


This site uses Just the Docs, a documentation theme for Jekyll.