Hello community,
I'm trying to utilize PROC SQL to count the frequency of a variable and then use the count from a different row to create a new variable.
An example of my data:
Worker ID | Facility | Year |
1 | A | 2010 |
2 | A | 2010 |
3 | A | 2011 |
4 | A | 2012 |
5 | A | 2012 |
6 | B | 2011 |
7 | B | 2011 |
8 | B | 2012 |
9 | B | 2012 |
10 | B | 2013 |
What I want to accomplish in PROC SQL:
Facility | Year | Count | Count from Previous Year |
A | 2010 | 2 | 0 |
A | 2011 | 1 | 2 |
A | 2012 | 2 | 1 |
A | 2013 | 0 | 2 |
B | 2010 | 0 | 0 |
B | 2011 | 2 | 0 |
B | 2012 | 2 | 2 |
B | 2013 | 1 | 2 |
I know how to get to the "Count" variable using Group by, however getting the "Count from Previous Year" is where I'm having difficulty.
And then ultimately I want to use the "Count from previous year" variable and add it on to the original table by facility and year, so:
Worker ID | Facility | Year | Count from Previous Year |
1 | A | 2010 | 0 |
2 | A | 2010 | 0 |
3 | A | 2011 | 2 |
4 | A | 2012 | 1 |
5 | A | 2012 | 1 |
6 | B | 2011 | 0 |
7 | B | 2011 | 0 |
8 | B | 2012 | 2 |
9 | B | 2012 | 2 |
10 | B | 2013 | 2 |
I'm open to any and all suggestions! Thank you all in advance.
One way to create the table with both year's counts, assuming you did the summary to get the count into a data set named CountSummary is to join that result to itself using the Year with an offset. The Coalesce function returns the first non-missing value as a result. So when you have a 'previous year' without a count the b.count would be missing and should get 0 as the result.
proc sql; create table want as select a.* ,coalesce(b.count,0) as CountPrevYear from countsummary as a left join countsummary as b on a.facility=b.facility and a.year= (b.year+1) ; quit;
But you don't need the intermediate table. You could join the CountSummary to the other data using the same year offset approach.
One way to create the table with both year's counts, assuming you did the summary to get the count into a data set named CountSummary is to join that result to itself using the Year with an offset. The Coalesce function returns the first non-missing value as a result. So when you have a 'previous year' without a count the b.count would be missing and should get 0 as the result.
proc sql; create table want as select a.* ,coalesce(b.count,0) as CountPrevYear from countsummary as a left join countsummary as b on a.facility=b.facility and a.year= (b.year+1) ; quit;
But you don't need the intermediate table. You could join the CountSummary to the other data using the same year offset approach.
data have;
infile cards expandtabs;
input WorkerID Facility $ Year;
cards;
1 A 2010
2 A 2010
3 A 2011
4 A 2012
5 A 2012
6 B 2011
7 B 2011
8 B 2012
9 B 2012
10 B 2013
;
proc sql;
create table want as
select *,
(select count(*) from have
where Facility=a.Facility and Year=a.Year-1) as CountfromPreviousYear
from have as a ;
quit;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.