The purpose of this post is to learn how to access and analyze data about river flow discharge from the United States Geological Survey (USGS). We start by downloading a few years of flow data from the USGS and then we analyze the downloaded data to identify periods of medium flow suitable for whitewater kayaking. This analysis will be performed for a single section of a single river in western North Carolina, but the USGS provides flow data for most waterways in the United States (and similar flow databases exist for other countries), so a similar analysis could easily be repeated for other rivers. A key goal in this post is to illustrate how to obtain and interpret this flow data, making it easier for others to download and analyze data about rivers and waterways.
We start by downloading the flow data from the USGS, available at waterdata.usgs.gov. From there we select the monitoring location that corresponds to the river gauge from which we are interested in collecting data. The easiest way to browse the available monitoring locations is by map on a state-by-state basis, but in general it is best to select a river of interest and look for monitoring locations on that river by map. We select the monitoring location for the Pigeon River Near Hepco, NC, and from here we can download data corresponding to the following attributes:
For our analysis we want to focus on the discharge measurements, so we will download a year of discharge from the Pigeon River near Hepco, NC monitoring site. We first change the data time span to 3 years and then use the ‘download data’ to get a tab-separated value file with timing and discharge information. We save this tab-delimited data as a .txt file and then import it into SAS (note that we manually deleted the header to clean the data). The USGS offers API access to this data as well for bulk download, but when working with only a few rivers and a single measure it is easy and convenient to use the download tool provided by the USGS.
/*import the tab-delimited file*/
proc import datafile='/path/to/hepco_2025_tabdlm.txt'
out=work.hepco_2025
dbms=dlm
replace;
delimiter='09'x;
run;
data work.hepco_2025;
set work.hepco_2025;
rename '90240_00060'n = discharge;
drop agency_cd site_no tz_cd '90240_00060_cd'n;
run;
title 'Pigeon River Flow at Hepco, NC';
proc sgplot data=work.hepco_2025;
series x=datetime y=discharge;
xaxis label='Time';
yaxis label='Discharge (cfs)';
run;
In this case we use PROC IMPORT to load the tab-delimited file, noting that the delimiter is the tab character. The data comes with some extra columns, and the discharge column has the site ID number instead of a descriptive name. We rename the discharge column and drop the extra data before plotting the time series we have just downloaded.
Now that we have the data available in SAS, we can filter it to identify times where the flow is suitable for whitewater kayaking. This is ultimately a subjective determination that depends on the kayaker’s risk tolerance (higher flows are almost always more dangerous), and the kayaker’s tolerance for scraping against rocks and the riverbed (low flows can leave the river too dry to navigate even with a kayak). The suitable flow range for whitewater kayaking will be different for each river, generally depending on the width of the river and the gradient of the section of river (the gradient is the elevation change from top to bottom divided by the length of the section). For the section of the Pigeon River near Hepco, NC this range is between 900 and 2,500 cfs. This data is clearly dominated by major storm events where the flow increases and decreases rapidly. It’s easier to highlight the flows of interest if we plot the log of the discharge instead of the discharge itself.
/*things are easier to see on a log plot*/
proc sgplot data=work.hepco_2025;
band x=datetime lower=900 upper=2500;
series x=datetime y=discharge;
xaxis label='Time';
yaxis label='Discharge (cfs)' type=LOG;
refline 900 / axis=y lineattrs=(color=red thickness=2);
refline 2500 / axis=y lineattrs=(color=red thickness=2);
run;
Looking at the plot, we can see that there were only around ten time periods where the river flow was suitable for kayaking. It’s hard to tell at this scale how many days each of these periods last, or if the 900-1500 cfs discharge even occurred during the day. This section of river is in rural North Carolina, so there are no lights on the river at night and night kayaking would be unsafe. We can filter the data to only include periods where the water is between 900-1500 cfs during the 8AM-5PM time window.
/*the river is only runnable above 900 cubic feet per second (cfs), and is probably too high above 2500 cfs*/
data work.hepco_2025_runnable;
set work.hepco_2025;
where (discharge >= 900 and discharge <= 2500);
run;
/*we want to go whitewater kayaking during the day, so let's filter out any observations outside of the 8AM - 5PM window*/
data work.hepco_2025_runnable;
set work.hepco_2025_runnable;
where (timepart(datetime) >= '08:00:00'T and timepart(datetime) <= '17:00:00'T);
run;
The original time series data is in 15 minute intervals, but it’s impossible to kayak this section of the river in 15 minutes. We don’t want to kayak the river only to find that halfway down the water ‘runs out’ (the discharge drops below the desired level) and discover we must drag the kayak across rocks in the mostly dry riverbed. This section of river is about 7.2 miles from the put-in (where we enter the river) to the take-out (where we leave the river), and if we assume kayakers can travel at 4 miles per hour, we’d need around 2 hours to run the river. It’s safer to have a buffer of time in case we want to stop or need to perform as rescue, so we look for continuous 3 hours periods of good flows (discharge between 900 and 1500 cfs) in the data.
data work.hepco_2025_runnable;
set work.hepco_2025_runnable;
date = datepart(datetime);
run;
proc freq data=hepco_2025_runnable nlevels noprint;
tables date / out=work.hepco_2025_days;
run;
The data comes in 15-minute increments, but we are interested in days where the river is runnable (where there are at least 3 hours of good flow), so we use the FREQ procedure to identify days with enough flow. The 15-minute interval data means that 12 intervals (3 hours) would be enough duration to kayak sometime during the day.
proc print data=hepco_2025_days;
where COUNT>12;
var date count;
format date date9.;
run;
We made sure to filter only those days where we had at least 12 observations with enough flow to kayak. We didn’t do anything to ensure that the 12 observations were in sequence, but the discharge generally increases after moderate to heavy rainfall in the area and then declines after, so we don’t have to worry about weird sinusoidal flow patterns during the day. We identify 35 days during the year when the Hepco section of the Pigeon River has enough water flow to kayak. These days occur in all seasons, although there are a lot more days in the later winter and early spring than other seasons. We would want to wear warm layers and some kind of protective layer (like a drysuit) to safely kayak this river in cold weather.
This was a simple analysis, and we didn’t use any advanced analytics techniques, but it is something that can be done on all kinds of different rivers to identify good times to kayak. Some rivers are runnable year-round at all kinds of flows, whereas other rivers can only be run a few times a year after heavy rainfall. This section is a bit challenging to catch with enough water (only 35 days in 2025) but isn’t quite as rare as some sections with smaller drainage basins or in drier areas.
Although this analysis of river flow data was focused on whitewater kayaking, the USGS provides flow information at many monitoring sites all over the United States (and more detailed water quality information at a smaller subset of sites). This information can be useful in analyzing historical flood patterns or just exploring the behavior of local rivers. The USGS provides a map of monitoring sites (linked in the references), although it might be easier to search local rivers by name to find nearby monitoring sites.
References:
Dive into keynotes, announcements and breakthroughs on demand.
Explore Now →The rapid growth of AI technologies is driving an AI skills gap and demand for AI talent. Ready to grow your AI literacy? SAS offers free ways to get started for beginners, business leaders, and analytics professionals of all skill levels. Your future self will thank you.