BookmarkSubscribeRSS Feed

Analyzing USGS Gauge Data to Identify Medium Flow Events in the Pigeon River

Started ‎03-09-2026 by
Modified a week ago by
Views 205

The purpose of this post is to learn how to access and analyze data about river flow discharge from the United States Geological Survey (USGS). We start by downloading a few years of flow data from the USGS and then we analyze the downloaded data to identify periods of medium flow suitable for whitewater kayaking. This analysis will be performed for a single section of a single river in western North Carolina, but the USGS provides flow data for most waterways in the United States (and similar flow databases exist for other countries), so a similar analysis could easily be repeated for other rivers. A key goal in this post is to illustrate how to obtain and interpret this flow data, making it easier for others to download and analyze data about rivers and waterways.

 

We start by downloading the flow data from the USGS, available at waterdata.usgs.gov. From there we select the monitoring location that corresponds to the river gauge from which we are interested in collecting data. The easiest way to browse the available monitoring locations is by map on a state-by-state basis, but in general it is best to select a river of interest and look for monitoring locations on that river by map. We select the monitoring location for the Pigeon River Near Hepco, NC, and from here we can download data corresponding to the following attributes:

 

  • Gage height, feet
    • This is the actual reading from the river of the height of the river at the monitoring location. The mechanics of the actual gauges themselves vary by location but in the United States they report the river height in feet.
    • This number is only comparable to previous values at the same site since the gage height is measured with respect to an arbitrary baseline. This number cannot be compared across rivers (different depths and widths) and cannot even be compared across different monitoring locations on the same river since the definition of 0 feet might differ across sites.
  • Discharge, cubic feet per second
    • This is a measure of how much water flows through the river and is derived from the gage height with a formula calculated by the USGS based on the river depth and width. Each site will have a different formula, and the calibration between gage height and discharge may change after major flooding events that alter the riverbed.
    • This number is the measure of analytical interest when thinking about river flow, since it reveals how much water is moving through the river and can be compared across rivers and monitoring sites. If we imagine a cross section of river with the flow perpendicular to the cross-section we can think of the discharge in cubic feet per second (cfs) as the number of 1-foot cubes of water flowing through the cross section in a second. This can be an unintuitive number since we don’t usually think about flow volumes in our everyday lives. To build some intuition about discharge levels let’s look at some descriptive examples (the descriptions are universal, the locations are selected from rivers in the United States, with preference for rivers near SAS Campus Headquarters in Cary, NC):
      • 0-100 cfs: This is a small dry creek between 15-30 feet wide and probably no deeper than 2-5 feet at any point.
        • Crabtree Creek in Raleigh, NC is generally between 10-30 cfs and rises to around 100 cfs after normal rains. It peaked in 2025 at about 2,000 cfs for a moment in August.
      • 100-500 cfs: This is that same small creek after a rainstorm, the water has risen (the depth will vary in different places in the river) and the flow is strong and powerful. It could also be a slightly larger river in dry conditions.
        • The Haw River in Chapel Hill, NC that feeds into Jordan Lake is usually around 300 cfs and rises to somewhere between 1,000 cfs and 10,000 cfs after rainstorms in the area. It peaked in 2025 at 98,100 cfs in July after Hurricane Chantal hit the area.
      • 500-1,000 cfs: This is a medium-sized dry river between 30-80 feet wide and probably 5-10 feet deep (it could be deeper in places).
        • The Cape Fear River near Lillington, NC is usually just below 500 cfs in the fall and winter and varies between 1,000 and 5,000 cfs in the spring and early summer. Dam operations upstream impact this flow.
      • 1,000-10,000 cfs: This is that same large river after a rainstorm, the water has risen and the current is strong and powerful. As the river approaches 10,000 cfs it generally approaches flood stage. This could also be a large river during dry conditions.
        • The Colorado River through the Grand Canyon is controlled by the Glen Canyon Dam and varies between 5,000 and 15,000 cfs depending on how much water is available (and depending on water demand downstream, a multi-state compact guarantees water rights to downstream states).
      • 10,000-100,000 cfs: This is either a normally flowing large river (100 feet wide at points) or a flooded medium river.
        • The Potomac River right upstream of Washington, DC is generally around 10,000 cfs during the spring and summer, although it drops to around 2,000 cfs during the fall and winter and varies depending on rain.
      • 100,000 cfs or above: This is either a massive river near the sea or a medium/large river during a flood.
        • The Columbia River near Portland, OR varies between 100,000 and 200,000 cfs.
      • American Whitewater provides an inventory of rivers at the American Whitewater River Index. This site is focused on whitewater recreation but provides information about the rivers in the United States at different flow levels and sometimes includes photos and descriptions of the river.
    • Stream water level elevation above NAVD 1988, in feet
      • This is a measure of the surface elevation of the river above the North American Vertical Datum (NAVD), which is essentially a standardized version of sea level used for exact geographic measurements.
      • This number provides useful information about the elevation and climate of the river environment but does not reveal the elevation of the water level above the streambed. It can be compared to other rivers since the baseline is the NAVD for all rivers, but it mostly allows us to compare the overall elevation of the rivers rather than the flow differences.
    • Some monitoring locations will provide information about water temperature and several measures related to water quality. This is more common in cities or at important monitoring locations.

 

For our analysis we want to focus on the discharge measurements, so we will download a year of discharge from the Pigeon River near Hepco, NC monitoring site. We first change the data time span to 3 years and then use the ‘download data’ to get a tab-separated value file with timing and discharge information. We save this tab-delimited data as a .txt file and then import it into SAS (note that we manually deleted the header to clean the data). The USGS offers API access to this data as well for bulk download, but when working with only a few rivers and a single measure it is easy and convenient to use the download tool provided by the USGS.

 

/*import the tab-delimited file*/
proc import datafile='/path/to/hepco_2025_tabdlm.txt'
            out=work.hepco_2025
            dbms=dlm
            replace;
    delimiter='09'x;
run;

data work.hepco_2025;
set work.hepco_2025;
rename '90240_00060'n = discharge;
drop agency_cd site_no tz_cd '90240_00060_cd'n;
run;

title 'Pigeon River Flow at Hepco, NC';
proc sgplot data=work.hepco_2025;
series x=datetime y=discharge;
xaxis label='Time';
yaxis label='Discharge (cfs)';
run;

 

In this case we use PROC IMPORT to load the tab-delimited file, noting that the delimiter is the tab character. The data comes with some extra columns, and the discharge column has the site ID number instead of a descriptive name. We rename the discharge column and drop the extra data before plotting the time series we have just downloaded.

 

01_arziti_hepcoSeries.png

 

Now that we have the data available in SAS, we can filter it to identify times where the flow is suitable for whitewater kayaking. This is ultimately a subjective determination that depends on the kayaker’s risk tolerance (higher flows are almost always more dangerous), and the kayaker’s tolerance for scraping against rocks and the riverbed (low flows can leave the river too dry to navigate even with a kayak). The suitable flow range for whitewater kayaking will be different for each river, generally depending on the width of the river and the gradient of the section of river (the gradient is the elevation change from top to bottom divided by the length of the section). For the section of the Pigeon River near Hepco, NC this range is between 900 and 2,500 cfs. This data is clearly dominated by major storm events where the flow increases and decreases rapidly. It’s easier to highlight the flows of interest if we plot the log of the discharge instead of the discharge itself.

 

/*things are easier to see on a log plot*/
proc sgplot data=work.hepco_2025;
    band x=datetime lower=900 upper=2500;
    series x=datetime y=discharge;
    xaxis label='Time';
    yaxis label='Discharge (cfs)' type=LOG;
    refline 900 / axis=y lineattrs=(color=red thickness=2);
    refline 2500 / axis=y lineattrs=(color=red thickness=2);
run;

 

02_arziti_hepcoLog.png

 

Looking at the plot, we can see that there were only around ten time periods where the river flow was suitable for kayaking. It’s hard to tell at this scale how many days each of these periods last, or if the 900-1500 cfs discharge even occurred during the day. This section of river is in rural North Carolina, so there are no lights on the river at night and night kayaking would be unsafe. We can filter the data to only include periods where the water is between 900-1500 cfs during the 8AM-5PM time window.

 

/*the river is only runnable above 900 cubic feet per second (cfs), and is probably too high above 2500 cfs*/
data work.hepco_2025_runnable;
    set work.hepco_2025;
    where (discharge >= 900 and discharge <= 2500); 
run; 

/*we want to go whitewater kayaking during the day, so let's filter out any observations outside of the 8AM - 5PM window*/ 
data work.hepco_2025_runnable; 
    set work.hepco_2025_runnable; 
    where (timepart(datetime) >= '08:00:00'T and timepart(datetime) <= '17:00:00'T);
run;

 

The original time series data is in 15 minute intervals, but it’s impossible to kayak this section of the river in 15 minutes. We don’t want to kayak the river only to find that halfway down the water ‘runs out’ (the discharge drops below the desired level) and discover we must drag the kayak across rocks in the mostly dry riverbed. This section of river is about 7.2 miles from the put-in (where we enter the river) to the take-out (where we leave the river), and if we assume kayakers can travel at 4 miles per hour, we’d need around 2 hours to run the river. It’s safer to have a buffer of time in case we want to stop or need to perform as rescue, so we look for continuous 3 hours periods of good flows (discharge between 900 and 1500 cfs) in the data.

 

data work.hepco_2025_runnable; 
    set work.hepco_2025_runnable; 
    date = datepart(datetime); 
run; 

proc freq data=hepco_2025_runnable nlevels noprint;
    tables date / out=work.hepco_2025_days;
    run;

 

The data comes in 15-minute increments, but we are interested in days where the river is runnable (where there are at least 3 hours of good flow), so we use the FREQ procedure to identify days with enough flow. The 15-minute interval data means that 12 intervals (3 hours) would be enough duration to kayak sometime during the day.

 

proc print data=hepco_2025_days;
    where COUNT>12;
    var date count;
    format date date9.;
run;

 

03_arziti_hepcoPrint.png

 

We made sure to filter only those days where we had at least 12 observations with enough flow to kayak. We didn’t do anything to ensure that the 12 observations were in sequence, but the discharge generally increases after moderate to heavy rainfall in the area and then declines after, so we don’t have to worry about weird sinusoidal flow patterns during the day. We identify 35 days during the year when the Hepco section of the Pigeon River has enough water flow to kayak. These days occur in all seasons, although there are a lot more days in the later winter and early spring than other seasons. We would want to wear warm layers and some kind of protective layer (like a drysuit) to safely kayak this river in cold weather.

 

This was a simple analysis, and we didn’t use any advanced analytics techniques, but it is something that can be done on all kinds of different rivers to identify good times to kayak. Some rivers are runnable year-round at all kinds of flows, whereas other rivers can only be run a few times a year after heavy rainfall. This section is a bit challenging to catch with enough water (only 35 days in 2025) but isn’t quite as rare as some sections with smaller drainage basins or in drier areas.

 

Although this analysis of river flow data was focused on whitewater kayaking, the USGS provides flow information at many monitoring sites all over the United States (and more detailed water quality information at a smaller subset of sites). This information can be useful in analyzing historical flood patterns or just exploring the behavior of local rivers. The USGS provides a map of monitoring sites (linked in the references), although it might be easier to search local rivers by name to find nearby monitoring sites.

 

References:

 

Contributors
Version history
Last update:
a week ago
Updated by:

Catch up on SAS Innovate 2026

Dive into keynotes, announcements and breakthroughs on demand.

Explore Now →

SAS AI and Machine Learning Courses

The rapid growth of AI technologies is driving an AI skills gap and demand for AI talent. Ready to grow your AI literacy? SAS offers free ways to get started for beginners, business leaders, and analytics professionals of all skill levels. Your future self will thank you.

Get started

Article Tags