Skip to the content.

Quick Links


Locating Canopy or Tree Gaps for Foresters

:computer: Source Code at github.com/earthlab-education/treebeard

This is an update to my last post. Our team — Chris, Peter, and I — has produced some meaningful results based on the requirements from the Watershed Center.

Project Background

Why This Matters

A quick reminder of the project’s significance comes from a recent quote by Eric Frederick:

Forest structural diversity is a key component of forest ecosystem health, as forests that contain structural heterogeneity are critical for providing wildlife habitat and can be more resilient to natural disturbances. Being able to quantify forest structural heterogeneity is important to be able to assess the need for potential management actions, and it allows us to ensure that our forest management projects are creating structural heterogeneity rather than homogeneity. This tool will allow us to better determine the need for canopy gaps in project areas, and it will allow us to more accurately pinpoint beneficial locations for creating canopy gaps of various sizes when implementing projects.

Team Focus Areas

Project Workflow

Before diving into the details, here’s our project workflow:

Treebeard Project Workflow

Image Processing and Segmentation

Methods and Techniques

My focus has been on processing aerial images, and particularly, I’d love to talk about image segmentation below. Since the last post, we have decided to use K-means clustering due to its simplicity. Our image segmentation method involves several steps:

  1. Calculate NDVI: Compute the NDVI pixel by pixel.
  2. Quickshift: Use the quickshift method from scikit-image to group similar pixels into larger segments.
  3. Mean NDVI Calculation: Calculate the mean NDVI for each segment.
  4. K-means Clustering: Apply K-means clustering (k=2) on the NDVI means.
  5. Classification: Infer that the higher NDVI cluster represents canopy/trees, and the lower NDVI cluster represents canopy gaps/tree gaps.

Visual Examples

Here are the illustrations of the process:

Image Segmentation 01 Image Segmentation 02 Image Segmentation 03

Performance and Challenges

This process works well for areas at or below 300 acres, completing in 1-10 minutes. However, for larger study areas (e.g., 1500 acres), the workflow took over 30 hours on an MBP (M1). We haven’t benchmarked this method extensively, but it should be much faster than clustering on NDVI alone. If I had a chance after wrapping up my certificate program with CU Boulder, I’d love to refactor and optimize the image segmentation method, as there are some unnecessary steps at various places.

Buffer, Calculate Area & Bin

After segmenting the image into two clusters, a default 5-ft buffer is applied to smooth out the canopy cluster. This creates more accurate and reasonable canopy gaps for calculating areas and binning, as illustrated below.

Buffer Area Bin

Results Comparison

The results from image segmentation and LIDAR processing yield similar outcomes except the highlighted areas:

Compare Results Result Analysis

Data Source

Summary

We are fortunate to have both aerial and LIDAR data sources available in the Lefthand Creek Watershed region, where the Watershed Center is focused. Workflows have been developed to process either data source, producing comparable results that will be highly useful for the Watershed Center’s foresters. This project has been an incredibly valuable learning experience with practical benefits for all parties involved. The workflow can be reused and applied to other regions as well. As a reminder, the links to the source codes are provided below the title above. I hope these resources will be valuable to anyone who comes across them.

Forest Horizontal Heterogeneity

This is a quick update on developing a tool for the Watershed Center. The initial goal is to create a GIS plug-in for identifying openings in forests which would aid our target users - forest managers like Eric Frederick to assess the distribution of openings in the forest. The tool will be used to supporting the Watershed Center’s vision of “[a] healthy and resilient watershed that can sustain wildfire and other natural disturbances to protect communities, keep water supplies reliable, and support diverse flora and fauna for current and future generations.” Interactive HTML Source Code

This is still very much work in progress with my team (Peter Kobylarz and Chris Griego). We are exploring the horizontal heterogeneity of forests in the Lefthand Creek Watershed near Boulder, Colorado. Lefthand Creek Watershed

This is an immediate project area.

Study Area

We are using the following data sources:

I have tried three different methods to identify openings in the forest:

  1. NDVI Thresholding: Using the NDVI values (open_spaces = ndvi >= 0.1) to identify the openings in the forest.
  2. K-means Clustering: Using the k-means clustering algorithm (k = 2) to identify the openings in the forest.
  3. Segment Anything Model: Using the Segment Anything model to identify the openings in the forest.

Here’s the result of these methods in order:

NDVI Thresholding K-means Clustering Segment Anything Model

After creating the masking, I have converted the raster data to vector data to calculate the area of the openings in the forest. The projection is EPSG:6428 with US survey foot as the unit. The area of the openings in the forest is calculated through gdf.area / 43560 to convert the area from square feet to acres. Additionally, the openings are classified into different bins: 1/8 acre, 1/4 acre, 1/2 acre, and 1+ acres.

Here’s the result of the area calculation:

NDVI Openings K-means Openings Segment Anything Openings

Immediately, I have learned that I need to find better ways to creating these polygons (potentially filling in the small holes) and I also need a better way to group by nearest neighbors to identify the openings in the forest. I am excited to learn more about the spatial data processing and how to apply it to the real world.

Separately, Peter has explored on how to process the LiDAR data to identify the openings in the forest.

Here’s the result of that:

LiDAR Openings

Overall, this is a great learning experience for me. I am looking forward to discuss with my team and my class on how to further hone my techniques and the develop a tool that has practical applications for the Watershed Center.

Urban Greenspace

I have learned how to calculate NDVI statistics on the 1-m resolution aerial images from the National Agriculture Imagery Program (NAIP). As mentioned below, I have come across the Denver Regional Aerial Photography Project which offers even higher resolution aerial images (3-inch, 6-inch, 12-inch). I wanted to explore the NDVI statistics on these high-resolution images. I have chosen the city of Lafayette, Colorado, as my study site. Interactive HTML Source Code

Here’s the result of that: Urban Greenspace (Lafayette, CO)

Aerial Images Exploratory via Kmeans Clustering

I learned how to process HLSL30 data product using the k-means clustering algorithm. The format was provided in Cloud Optimized GeoTIFF (COG) format. I then came across a dataset of Denver regional aerial images. I wanted to practice the k-means clustering algorithm on these high-quality images. HTML Source Code

The results were interesting. I have learned that pixel interleaving impacts how bands are stored differently in comparison to band interleaving.

Here’s the gdalinfo output to confirm the image structure:

% gdalinfo N2W134b.tif 
Driver: GTiff/GeoTIFF
Files: N2W134b.tif
Size is 10560, 10560
Coordinate System is:
PROJCRS["NAD83(2011) / Colorado Central (ftUS)",
    BASEGEOGCRS["NAD83(2011)",
        DATUM["NAD83 (National Spatial Reference System 2011)",
            ELLIPSOID["GRS 1980",6378137,298.257222101004,
                LENGTHUNIT["metre",1]]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433]],
        ID["EPSG",6318]],
    CONVERSION["Lambert Conic Conformal (2SP)",
        METHOD["Lambert Conic Conformal (2SP)",
            ID["EPSG",9802]],
        PARAMETER["Latitude of false origin",37.8333333333333,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8821]],
        PARAMETER["Longitude of false origin",-105.5,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8822]],
        PARAMETER["Latitude of 1st standard parallel",39.75,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8823]],
        PARAMETER["Latitude of 2nd standard parallel",38.45,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8824]],
        PARAMETER["Easting at false origin",914401.828803658,
            LENGTHUNIT["metre",1],
            ID["EPSG",8826]],
        PARAMETER["Northing at false origin",304800.609601219,
            LENGTHUNIT["metre",1],
            ID["EPSG",8827]]],
    CS[Cartesian,2],
        AXIS["easting",east,
            ORDER[1],
            LENGTHUNIT["US survey foot",0.304800609601219]],
        AXIS["northing",north,
            ORDER[2],
            LENGTHUNIT["US survey foot",0.304800609601219]],
    ID["EPSG",6428]]
Data axis to CRS axis mapping: 1,2
Origin = (3112351.999799999874085,1784791.999800000106916)
Pixel Size = (0.250000000000000,-0.250000000000000)
Metadata:
  AREA_OR_POINT=Area
  TIFFTAG_DATETIME=2020:09:30 17:45:42
  TIFFTAG_RESOLUTIONUNIT=2 (pixels/inch)
  TIFFTAG_SOFTWARE=Adobe Photoshop CS5 Windows
  TIFFTAG_XRESOLUTION=72
  TIFFTAG_YRESOLUTION=72
Image Structure Metadata:
  INTERLEAVE=PIXEL
Corner Coordinates:
Upper Left  ( 3112352.000, 1784792.000) (105d 5'56.66"W, 39d59'14.59"N)
Lower Left  ( 3112352.000, 1782152.000) (105d 5'56.81"W, 39d58'48.50"N)
Upper Right ( 3114992.000, 1784792.000) (105d 5'22.74"W, 39d59'14.47"N)
Lower Right ( 3114992.000, 1782152.000) (105d 5'22.89"W, 39d58'48.38"N)
Center      ( 3113672.000, 1783472.000) (105d 5'39.77"W, 39d59' 1.49"N)
Band 1 Block=10560x1 Type=Byte, ColorInterp=Red
Band 2 Block=10560x1 Type=Byte, ColorInterp=Green
Band 3 Block=10560x1 Type=Byte, ColorInterp=Blue
Band 4 Block=10560x1 Type=Byte, ColorInterp=Undefined

Band interleaving stores each pixel band by band as illustrated by this image. Band Interleaving

Pixel interleaving stores pixels one band at a time as illustrated by this image. Pixel Interleaving

(Credit: ArcGIS)

I picked a study site that’s about 1 mile by 1 mile in size, near my home.

Here’s the RGB image before clustering: Lafayette, CO

I have processed four tiles which took quite a bit of time (~20-30 minutes) on my MacBook Pro (M1). Lafayette, CO

This seems like an excellent way to explore the data from the land cover perspective. I am excited to learn more about the land cover classifications and how to apply it to the real world.

Multiple Data Layers

Multiple data layers related to soil, topography, and climate were handled in this project while trying to build a habitat suitability model for Sorghasstrum Nutans. Interactive HTML Source Code

Snippets from this project…

Fuzzy Logic Model (Suitability)

These parameters haven’t gone through any rigirous selection process.

To demonstrate the ability to manipulate and harmonize DataArray objects, the following were defined:

Oglala National Grassland Suitability

Elevation

Oglala National Grassland Elevation

Elevation Aspect

Oglala National Grassland Elevation Aspect

Soil pH

Oglala National Grassland Soil pH

Precipitation

Oglala National Grassland Precipitation

Specific Humidity

Oglala National Grassland Precipitation

Next Steps

Look into Vapor Pressure Deficit over Specific Humidity, since it is more directly related to evaporative stress.

Data Citation

2022 NDVI for Popular Neighborhoods in San Francisco

According to USDA -

In general, NDVI values range from -1.0 to 1.0, with negative values indicating clouds and water, positive values near zero indicating bare soil, and higher positive values of NDVI ranging from sparse vegetation (0.1 - 0.5) to dense green vegetation (0.6 and above).

In this exercise, 2022 NDVI statistics were calculated for three popular neighborhoods (Marina, Mission, North Beach) in San Francisco to assess the greenness and overall suitability for living in each neighborhood. Presidio was added as a benchmark. Nothing conclusive, but it’s fun to just look from this perspective! Interactive HTML Source Code
San Francisco Chloropleth Table
San Francisco Chloropleth

Data Source:

San Francisco Open Data
San Francisco 2022 NAIP

Fire Density in California Watersheds

Increasing both numbers and sizes of wildfire in California Watersheds.
In the case of Sacremento, fire size is increasing in recent times. Interactive HTML Source Code
CA Watersheds - Fire Density 1992-2020 CA Watersheds - Fire Density Chloropleth Plot Sacremento - Fire Occurrence and Size

Vegetation Recovery after the Caldor Fire

Plotting dNDVI before and after the Caldor Fire suggests recovery in vegetation after two years.
Interactive HTML Source Code
Caldor Fire Boundary Caldor Fire - Change in NDVI between 2020 and 2023 Caldor Fire - Difference in NDVI inside and outside Caldor Fire boundary

Python Libraries:

earthpy, folium, geopandas, matplotlib, pandas, rioxarray, xarray

Data Source:
Citation:

Flood Frequency Analysis of Penn State

Plotting water time series data (1985-2022) suggests 2004 being the worst flooding year (a 38-year flood event) in the Penn State region in recent times. Interactive HTML Source Code PSU Daily Streamflow Average PSU Streamflow Annual Maxima PSU Flood Return Periods

Python Libraries:

folium, pandas, hvplot, requests

Data Source:

USGS National Water Dashboard > State College, PA > USGS 01546400 (Surface Water, Stream) > Data (Daily Data)

Citation:

U.S. Geological Survey, 2023, National Water Information System data available on the World Wide Web (USGS Water Data for the Nation), accessed [September 19, 2023], at URL http://waterdata.usgs.gov/nwis/

Fun Read:

San Francisco Average Temperatures

A upward trend over the observed years (1946-2022) Interactive HTML Source Code San Francisco Average Temperatures

Python Libraries:

pandas, hvplot, holoviews, numpy

Data Source:

Climate at a Glance > City > Time Series > San Francisco (Average Temperature, 12-Month, December)

Citation:

NOAA National Centers for Environmental information, Climate at a Glance: City Time Series, published August 2023, retrieved on September 7, 2023 from https://www.ncei.noaa.gov/access/monitoring/climate-at-a-glance/city/time-series

Fun Fact:

About Me:

Ed's Bio Picture

We came all this way to explore the moon, and the most important thing is that we discovered the Earth.

William Anders, Apollo 8 Astronaut

Hey there! It’s Ed. I am a technical consultant by trade. Having witnessed the Orange Skies Day in 2020 when I was living in Emeryville (San Francisco East Bay), I want to find ways to equip ourselves to respond and adapt to this new world we live in. While having a general interest in the space industry, I didn’t always have a strong interest in earth data science until my recent experience at GNSS+R 2023 and ESIP 2023 Summer Meeting. Space up, space down, so to speak. Right now, I am learning geoprocessing with Python at CU Boulder. I am excited to find out where I fit and how I can contribute with my skills. I am the happiest with my little cup of espresso. Open to connect!

Likes
Expresso :coffee:
Board games :game_die:
Books :books:

Interests
Climbing
Python
California’s Geology

Education
MSc Space Studies, International Space University (2021)
BS Information Systems Design and Development, Penn State (2015)

Find Me Online
Blog: eggvoice.com
GitHub: eggvoice
Twitter: myeggvoice
LinkedIn: Edward Chan