I am trying to calculate a variable that consists of the average number of surgeries performed at a given hospital over the previous two years.
What I have looks like :
patient hospital fiscal_year number_of_surgeries_performed_by_FY
1 1 2010 12
2 1 2011 2
3 1 2012 4
What I want :
patient hospital fiscal_year number_of_surgeries_performed_by_FY avg_surgeries
1 1 2010 12 -
2 1 2011 2 -
3 1 2012 4 7
where 7 is the average of 12+2 (surgeries performed in the previous two years). My dataset is large and some hospitals did not perform any cases in the two previous years.
Thank you!
Keeping the PATIENT variable on a dataset that has AGREGATTED surgery counts at the HOSPITAL and YEAR level makes no sense.
Let's assume you did start with PATIENT level data, something like this where PATIENT is the medical record number, HOSPITAL is the facility ID, FY is the fiscal year, and SURGERIES is how many surgeries were performed on that patient in the facility in that fy.
data patients;
input patient hospital FY surgeries;
datalines;
1 1 2013 1
2 1 2013 0
3 1 2013 2
4 1 2017 1
5 2 2010 1
6 2 2011 1
7 2 2012 1
9 2 2015 1
10 3 2013 1
11 3 2015 2
12 3 2015 2
;
The first step is to summarize at the HOSPITAL*FY level. You can use PROC SUMMARY with the COMPLETETYPES option to fill in the missing combinations.
proc summary nway completetypes;
class hospital fy;
var surgeries;
output out=hospital_fy sum= ;
run;
If that does not fill in every FY (in this example data are no records for FY=2016) then you can either add some dummy records into the patient level data or fixup the summary data. The former is easier to code if you know the range of years. This step will make a copy of PATIENTS with extra zero count records on the end to make sure every FY appears. (you could code this as a data step view).
data for_summary;
set patients end=eof;
output;
if eof then do;
surgeries=0;
do fy=2010 to 2017; output; end;
end;
run;
Now you can process by hospital and fy and perform your calculation.
data want;
set hospital_fy ;
by hospital fy ;
surgeries=sum(0,surgeries);
n+1;
if first.hospital then n=1;
if n>2 then avg_surgeries = (lag1+lag2)/2 ;
output;
lag2 = lag1;
lag1 = surgeries;
retain lag1 lag2;
drop _type_ _freq_ ;
run;
Result
avg_ Obs hospital FY surgeries n surgeries lag2 lag1 1 1 2010 0 1 . . . 2 1 2011 0 2 . . 0 3 1 2012 0 3 0.0 0 0 4 1 2013 3 4 0.0 0 0 5 1 2014 0 5 1.5 0 3 6 1 2015 0 6 1.5 3 0 7 1 2016 0 7 0.0 0 0 8 1 2017 1 8 0.0 0 0 9 2 2010 1 1 . 0 1 10 2 2011 1 2 . 1 1 11 2 2012 1 3 1.0 1 1 12 2 2013 0 4 1.0 1 1 13 2 2014 0 5 0.5 1 0 14 2 2015 1 6 0.0 0 0 15 2 2016 0 7 0.5 0 1 16 2 2017 0 8 0.5 1 0 17 3 2010 0 1 . 0 0 18 3 2011 0 2 . 0 0 19 3 2012 0 3 0.0 0 0 20 3 2013 1 4 0.0 0 0 21 3 2014 0 5 0.5 0 1 22 3 2015 4 6 0.5 1 0 23 3 2016 0 7 2.0 0 4 24 3 2017 0 8 2.0 4 0
The presence of a variable named "patient" sort of confuses the question about Hospital and year.
Is your data sorted by Hospital and year?
And you only want exactly two years previous?
You will need to provide what you expect for a result when one or more "previous" years does not have any surgeries. Such as is the count zero? Or is the year just not present in the data?
You should provide data in the form of a data step pasted into a text box opened on the forum with the </> icon that appears above the main message box. That way we can see what you have. This is an example:
data have; input patient hospital fiscal_year surgeries; datalines; 1 1 2010 12 2 1 2011 2 3 1 2012 4 4 1 2013 8 1 2 2010 02 2 2 2011 2 3 2 2012 4 4 2 2013 8 ;
I am way too lazy to name a variable as long as you did for your surgeries.
IF any of the description I provided above does not match your data such as missing years and value other than 0 for surgeries when there are none you have to provide example data and expected results for the exceptions.
If the data is "clean" this is one approach to what you request:
data want; set have; by hospital; retain hospcount ; if first.hospital then hospcount=1; else hospcount+1; ls1=lag1(surgeries); ls2=lag2(surgeries); if hospcount ge 3 then avgsurgeries = mean(ls1,ls2); keep patient hospital surgeries avgsurgeries; run;
How this works:
The BY statement creates automatic variables you can reference with First.variable and Last.Variable that are 0/1 numeric values that indicate whether the current observation is the first or last of a group. 1 is true so if first.variable says do this when the observation is the first. In this case it is used to set a counter variable, Hospcount, to 1 when it is the first observation for that hospital. The RETAIN means that variable value is kept across data step boundaries so can be used to accumulate a count (the Else hospcount+1; increments for the second, third etc).
The LAG function returns values of a variable from previous iterations so we get the two previous values of the surgeries. (Note: this is a queued function and if used in a conditional like "if x then y= lag(somevar);" will return the last time the condition was true. So extreme care is needed when using this and its partner function DIF).
Then when the Hospcount is large enough we use the MEAN function to calculate the average of the two lag variables.
Key the wanted variables. You can comment out the keep statement to see what the values of those others would be.
Again, I'm way too lazy to use a variable name as long as you suggested for the average.
I restate that you may need to provide a better example of what your data looks like without surgeries and what you expect as a result. I would suggest an example with one "missing" year and a separate with two and what you expect the result for the average to be.
You might want to use PROC SUMMRY to aggregate all surgery totals by hospital and year, then run a DATA step:
data have;
input patient hospital fiscal_year surgeries;
datalines;
1 1 2010 12
2 1 2011 2
3 1 2012 4
4 1 2013 8
1 2 2010 02
2 2 2011 2
3 2 2012 4
4 2 2013 8
;
proc summary data=have nway;
class hospital fiscal_year;
var surgeries;
output out=need sum=n_surgeries;
run;
data want ;
set need (drop=_type_ _freq_);
by hospital;
prior_two_year_mean=ifn(lag2(hospital)=hospital,mean(lag2(n_surgeries),lag1(n_surgeries)),.);
run;
Erroneous code below was replaced by use of the IFN functionns above:
if lag2(hospital)=hospital then prior_two_year_mean=mean(lag2(n_surgeries),lag1(n_surgeries));
The PROC SUMMARY doesn't care about the original data order, but will sort the results by HOSPITAL/YEAR (due to the variable order in the CLASS statement).
Thank you so much for the suggestion. Unfortunately, I'm very new to SAS and didn't know how to set up my question as clearly as you wanted. A few clarifications:
1) "Patient" is just to identify an individual who was treated at a given hospital.
2) My data is sorted by Hospital and Year.
3) I want the average number of surgeries from the previous two years, such that, if a hospital has not performed any surgeries in the previous 2 years, the average will be 0 or missing "." Currently, when a hospital has not performed any surgeries in a given year, it is not present in my dataset. I did wonder whether I'd have to create dummy variables for those absent years; this is what I'm struggling with. I initially used:
proc freq data= have;
tables hospital*year / out = new;
run;
This gave me what I wanted (frequency rows), but the new dataset removed years for when a hospital performed zero surgeries. Because of this, I suspect the "lag" function won't work for my data as it looks more like:
data have; input patient hospital fiscal_year surgeries; datalines; 1 1 2013 1 2 1 2014 1 3 1 2017 2 4 2 2010 3 5 2 2010 3 6 2 2010 3 7 2 2011 4 8 3 2013 8 ;
I am easily able to come up with the 2-year averages in excel, based on the proc freq table, but my dataset is large and I don't know how I'd merge the files. I also figure there must be an easier way.
What I am hoping for output is:
data have; input patient hospital fiscal_year surgeries 2_yr_avg_surg; datalines; 1 1 2013 1 0 2 1 2014 1 0.5 3 1 2017 2 0 4 2 2010 3 0 5 2 2010 3 0 6 2 2010 3 0 7 2 2011 4 1.5 8 3 2013 8 0 ;
I hope this helps to clarify what I'm trying to do. My dataset is 565 observations over 10 years.
I really do appreciate the help!
Hi I tried replying to your message the other day, but it seems it didn't post.
Thank you for the reply!
I am new here, so sorry that my question and its layout were unclear.
In reply to your questions,
1) My data is sorted by hospital and year
2) Patients indicate an individual treated at a given hospital. This is how I was able to calculate how many surgeries took place at a hospital in a given year.
2) I want to know the average of the two previous years. Unfortunately, when a hospital hasn't performed any surgeries in a year, then it doesn't show up in my dataset. I did wonder whether I'd have to use a dummy dataset to fill in these dates. The following code gives me the information I need, but when I tried to save the output as a dataset, it does not save the 0 values (e.g. hospital 1 performed zero surgeries in year 2010, so no row of data exists for that combination):
proc freq data = have;
tables hospital*year;
run;
The data I current have looks like:
data have; input patient hospital fiscal_year surgeries; datalines; 1 1 2013 3 2 1 2013 3 3 1 2013 3 4 1 2017 1 5 2 2010 1 6 2 2011 1 7 2 2012 1 9 2 2015 1 10 3 2013 1 11 3 2015 2 12 3 2015 2 ;
What I hope to obtain is:
data have; input patient hospital fiscal_year surgeries 2_yr_avg; datalines; 1 1 2013 3 0 2 1 2013 3 0 3 1 2013 3 0 4 1 2017 1 0 5 2 2010 1 0 6 2 2011 1 0.5 7 2 2012 1 1 9 2 2015 1 0 10 3 2013 1 0 11 3 2015 2 12 3 2015 2
;
I hope this makes sense. I don't think I can use the lag function because I don't have the previous years in some cases.
Thanks again!
Keeping the PATIENT variable on a dataset that has AGREGATTED surgery counts at the HOSPITAL and YEAR level makes no sense.
Let's assume you did start with PATIENT level data, something like this where PATIENT is the medical record number, HOSPITAL is the facility ID, FY is the fiscal year, and SURGERIES is how many surgeries were performed on that patient in the facility in that fy.
data patients;
input patient hospital FY surgeries;
datalines;
1 1 2013 1
2 1 2013 0
3 1 2013 2
4 1 2017 1
5 2 2010 1
6 2 2011 1
7 2 2012 1
9 2 2015 1
10 3 2013 1
11 3 2015 2
12 3 2015 2
;
The first step is to summarize at the HOSPITAL*FY level. You can use PROC SUMMARY with the COMPLETETYPES option to fill in the missing combinations.
proc summary nway completetypes;
class hospital fy;
var surgeries;
output out=hospital_fy sum= ;
run;
If that does not fill in every FY (in this example data are no records for FY=2016) then you can either add some dummy records into the patient level data or fixup the summary data. The former is easier to code if you know the range of years. This step will make a copy of PATIENTS with extra zero count records on the end to make sure every FY appears. (you could code this as a data step view).
data for_summary;
set patients end=eof;
output;
if eof then do;
surgeries=0;
do fy=2010 to 2017; output; end;
end;
run;
Now you can process by hospital and fy and perform your calculation.
data want;
set hospital_fy ;
by hospital fy ;
surgeries=sum(0,surgeries);
n+1;
if first.hospital then n=1;
if n>2 then avg_surgeries = (lag1+lag2)/2 ;
output;
lag2 = lag1;
lag1 = surgeries;
retain lag1 lag2;
drop _type_ _freq_ ;
run;
Result
avg_ Obs hospital FY surgeries n surgeries lag2 lag1 1 1 2010 0 1 . . . 2 1 2011 0 2 . . 0 3 1 2012 0 3 0.0 0 0 4 1 2013 3 4 0.0 0 0 5 1 2014 0 5 1.5 0 3 6 1 2015 0 6 1.5 3 0 7 1 2016 0 7 0.0 0 0 8 1 2017 1 8 0.0 0 0 9 2 2010 1 1 . 0 1 10 2 2011 1 2 . 1 1 11 2 2012 1 3 1.0 1 1 12 2 2013 0 4 1.0 1 1 13 2 2014 0 5 0.5 1 0 14 2 2015 1 6 0.0 0 0 15 2 2016 0 7 0.5 0 1 16 2 2017 0 8 0.5 1 0 17 3 2010 0 1 . 0 0 18 3 2011 0 2 . 0 0 19 3 2012 0 3 0.0 0 0 20 3 2013 1 4 0.0 0 0 21 3 2014 0 5 0.5 0 1 22 3 2015 4 6 0.5 1 0 23 3 2016 0 7 2.0 0 4 24 3 2017 0 8 2.0 4 0
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.