Hi,
I have a dataset as below ( I have county counts per month per year), how to create a new variable as the total count for each year per county?
also I have variable Z per month per county each year, howw to get the Z median as a new variable for each county for each year?
I tried this code but it is not working:
DATA CORRECT_TOTAL;
INPUT COUNTY YEAR PRECIPITATION TOTCOUNTS ;
BY YEAR COUNTY;
PRECIPITATION= MEDIAN(JAN_Z,FEB_Z,MAR_Z,APR_Z,MAY_Z,JUN_Z,JUL_Z,
AUG_Z,SEP_Z, OCT_Z,NOV_Z,DEC_Z);
COUNTS= SUM(JAN_COUNT, FEB_COUNT, ...);
RUN;
CUNTY | YEAR | JAN-Z | JAN-COUNT | FEB-Z | FEB-COUNT | MAR-Z | MAR-COUNT | APR-Z | APR-COUNT | MAY-Z | MAY-COUNT | JUN-Z | JUN-COUNT | JUL-Z | JUL-COUNT | AUG-Z | AUG-COUNT | SEPT-Z | SEPT-COUNT | OCT-Z | OCT-COUNT | NOV-Z | NOV-COUNT | DEC-Z | DEC-COUNT |
X | 2014 | -0.4 | 2 | -0.4 | 1 | -0.2 | 1 | -0.3 | 1 | -0.2 | 2 | -0.2 | 1 | -0.6 | 1 | -0.8 | 0 | -0.9 | 1 | -0.8 | 1 | -1.6 | 1 | -0.9 | 0 |
X | 2015 | -0.2 | 1 | 0 | 0 | -0.1 | 4 | 0 | 3 | 0.3 | 0 | 0.6 | 1 | 0.9 | 4 | 1 | 2 | 0.8 | 4 | 1.2 | 3 | 1.4 | 1 | 1.3 | 0 |
X | 2016 | 0.9 | 2 | 0.7 | 0 | 0.5 | 2 | 0.6 | 2 | 0.4 | 2 | 0.2 | 0 | -0.2 | 1 | -0.2 | 2 | -0.2 | 0 | -0.7 | 0 | -0.6 | 2 | -0.3 | 3 |
X | 2017 | 0 | 3 | 0.2 | 4 | 0.4 | 2 | 0.3 | 3 | 0.2 | 1 | 0.2 | 1 | 0.4 | 4 | -0.1 | 3 | -0.1 | 4 | -0.2 | 2 | -0.7 | 1 | -1.3 | 2 |
X | 2018 | -1.7 | 2 | -1.8 | 0 | -1.8 | 1 | -2 | 4 | -2.1 | 1 | -2 | 1 | -2.1 | 2 | -2 | 1 | -2.1 | 3 | -1.1 | 2 | -1.1 | 3 | -0.6 | 1 |
X | 2019 | -0.4 | 5 | -0.1 | 3 | 0.1 | 1 | 0.2 | 3 | 0.5 | 1 | 0.5 | 1 | 0 | 6 | -0.1 | 6 | 0 | 4 | -0.8 | 4 | -0.2 | 4 | -0.1 | 4 |
Y | 2014 | -0.6 | 6 | -0.7 | 7 | -0.5 | 4 | -0.5 | 5 | -0.5 | 8 | -0.5 | 6 | -0.8 | 1 | -0.3 | 6 | 0.4 | 1 |
I would read the data into a more normal structure first.
data have;
infile cards truncover;
input county $ year @ ;
do month=1 to 12 ;
input precip count @;
date=mdy(month,1,year);
if not missing(count) then output;
end;
format date yymmdd10.;
cards;
X 2014 -0.4 2 -0.4 1 -0.2 1 -0.3 1 -0.2 2 -0.2 1 -0.6 1 -0.8 0 -0.9 1 -0.8 1 -1.6 1 -0.9 0
X 2015 -0.2 1 0 0 -0.1 4 0 3 0.3 0 0.6 1 0.9 4 1 2 0.8 4 1.2 3 1.4 1 1.3 0
X 2016 0.9 2 0.7 0 0.5 2 0.6 2 0.4 2 0.2 0 -0.2 1 -0.2 2 -0.2 0 -0.7 0 -0.6 2 -0.3 3
X 2017 0 3 0.2 4 0.4 2 0.3 3 0.2 1 0.2 1 0.4 4 -0.1 3 -0.1 4 -0.2 2 -0.7 1 -1.3 2
X 2018 -1.7 2 -1.8 0 -1.8 1 -2 4 -2.1 1 -2 1 -2.1 2 -2 1 -2.1 3 -1.1 2 -1.1 3 -0.6 1
X 2019 -0.4 5 -0.1 3 0.1 1 0.2 3 0.5 1 0.5 1 0 6 -0.1 6 0 4 -0.8 4 -0.2 4 -0.1 4
Y 2014 -0.6 6 -0.7 7 -0.5 4 -0.5 5 -0.5 8 -0.5 6 -0.8 1 -0.3 6 0.4 1
;
Then it is simple to generate statistics
proc summary data=have nway;
by county;
class year ;
var precip count ;
output out=want sum(count)= median(precip)=precip_median ;
run;
Do not keep data (dates) in structure. Transpose your dataset to long, so that you get a date value (use the MDY function to combine month and year) and separate z and count variables. Then, a simple PROC MEANS will do it.
Your data step will not work. To use the INPUT statement, you must either have an INFILE statement pointing to an external file for reading, or DATALINES for inline data.
Agreeing with @Kurt_Bremser , but phrasing things a little differently: calendar information should not go in variable names! Use long data sets, not wide! This is a common mistake that some people make, and this causes you to use programming that is either more difficult, or more typing, or both. If you need to DISPLAY the results in a wide format, then use PROC REPORT. Remember: most SAS PROCs work better (less programming, less typing and probably less execution time) when you have long data sets.
To illustrate how simple the programming is if you have a transposed data set, the very short and simple PROC SUMMARY shown below is all you need.
data fake_data;
input county date :yymmn6. count z;
year=year(date);
format date yymms7.;
cards;
111 201401 2 -0.4
111 201402 1 -0.4
111 201403 1 -0.2
113 201401 5 -0.6
113 201402 3 0.4
113 201403 2 0.2
;
proc summary data=fake_data nway;
class county year;
var count z;
output out=summary sum(count)= median(z)=;
run;
Also, please do not type in ALL CAPITAL letters. Thanks
I would read the data into a more normal structure first.
data have;
infile cards truncover;
input county $ year @ ;
do month=1 to 12 ;
input precip count @;
date=mdy(month,1,year);
if not missing(count) then output;
end;
format date yymmdd10.;
cards;
X 2014 -0.4 2 -0.4 1 -0.2 1 -0.3 1 -0.2 2 -0.2 1 -0.6 1 -0.8 0 -0.9 1 -0.8 1 -1.6 1 -0.9 0
X 2015 -0.2 1 0 0 -0.1 4 0 3 0.3 0 0.6 1 0.9 4 1 2 0.8 4 1.2 3 1.4 1 1.3 0
X 2016 0.9 2 0.7 0 0.5 2 0.6 2 0.4 2 0.2 0 -0.2 1 -0.2 2 -0.2 0 -0.7 0 -0.6 2 -0.3 3
X 2017 0 3 0.2 4 0.4 2 0.3 3 0.2 1 0.2 1 0.4 4 -0.1 3 -0.1 4 -0.2 2 -0.7 1 -1.3 2
X 2018 -1.7 2 -1.8 0 -1.8 1 -2 4 -2.1 1 -2 1 -2.1 2 -2 1 -2.1 3 -1.1 2 -1.1 3 -0.6 1
X 2019 -0.4 5 -0.1 3 0.1 1 0.2 3 0.5 1 0.5 1 0 6 -0.1 6 0 4 -0.8 4 -0.2 4 -0.1 4
Y 2014 -0.6 6 -0.7 7 -0.5 4 -0.5 5 -0.5 8 -0.5 6 -0.8 1 -0.3 6 0.4 1
;
Then it is simple to generate statistics
proc summary data=have nway;
by county;
class year ;
var precip count ;
output out=want sum(count)= median(precip)=precip_median ;
run;
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.