Hello community,
I'm trying to utilize PROC SQL to count the frequency of a variable and then use the count from a different row to create a new variable.
An example of my data:
Worker ID | Facility | Year |
1 | A | 2010 |
2 | A | 2010 |
3 | A | 2011 |
4 | A | 2012 |
5 | A | 2012 |
6 | B | 2011 |
7 | B | 2011 |
8 | B | 2012 |
9 | B | 2012 |
10 | B | 2013 |
What I want to accomplish in PROC SQL:
Facility | Year | Count | Count from Previous Year |
A | 2010 | 2 | 0 |
A | 2011 | 1 | 2 |
A | 2012 | 2 | 1 |
A | 2013 | 0 | 2 |
B | 2010 | 0 | 0 |
B | 2011 | 2 | 0 |
B | 2012 | 2 | 2 |
B | 2013 | 1 | 2 |
I know how to get to the "Count" variable using Group by, however getting the "Count from Previous Year" is where I'm having difficulty.
And then ultimately I want to use the "Count from previous year" variable and add it on to the original table by facility and year, so:
Worker ID | Facility | Year | Count from Previous Year |
1 | A | 2010 | 0 |
2 | A | 2010 | 0 |
3 | A | 2011 | 2 |
4 | A | 2012 | 1 |
5 | A | 2012 | 1 |
6 | B | 2011 | 0 |
7 | B | 2011 | 0 |
8 | B | 2012 | 2 |
9 | B | 2012 | 2 |
10 | B | 2013 | 2 |
I'm open to any and all suggestions! Thank you all in advance.
One way to create the table with both year's counts, assuming you did the summary to get the count into a data set named CountSummary is to join that result to itself using the Year with an offset. The Coalesce function returns the first non-missing value as a result. So when you have a 'previous year' without a count the b.count would be missing and should get 0 as the result.
proc sql; create table want as select a.* ,coalesce(b.count,0) as CountPrevYear from countsummary as a left join countsummary as b on a.facility=b.facility and a.year= (b.year+1) ; quit;
But you don't need the intermediate table. You could join the CountSummary to the other data using the same year offset approach.
One way to create the table with both year's counts, assuming you did the summary to get the count into a data set named CountSummary is to join that result to itself using the Year with an offset. The Coalesce function returns the first non-missing value as a result. So when you have a 'previous year' without a count the b.count would be missing and should get 0 as the result.
proc sql; create table want as select a.* ,coalesce(b.count,0) as CountPrevYear from countsummary as a left join countsummary as b on a.facility=b.facility and a.year= (b.year+1) ; quit;
But you don't need the intermediate table. You could join the CountSummary to the other data using the same year offset approach.
data have;
infile cards expandtabs;
input WorkerID Facility $ Year;
cards;
1 A 2010
2 A 2010
3 A 2011
4 A 2012
5 A 2012
6 B 2011
7 B 2011
8 B 2012
9 B 2012
10 B 2013
;
proc sql;
create table want as
select *,
(select count(*) from have
where Facility=a.Facility and Year=a.Year-1) as CountfromPreviousYear
from have as a ;
quit;
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.