- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello community,
I'm trying to utilize PROC SQL to count the frequency of a variable and then use the count from a different row to create a new variable.
An example of my data:
Worker ID | Facility | Year |
1 | A | 2010 |
2 | A | 2010 |
3 | A | 2011 |
4 | A | 2012 |
5 | A | 2012 |
6 | B | 2011 |
7 | B | 2011 |
8 | B | 2012 |
9 | B | 2012 |
10 | B | 2013 |
What I want to accomplish in PROC SQL:
Facility | Year | Count | Count from Previous Year |
A | 2010 | 2 | 0 |
A | 2011 | 1 | 2 |
A | 2012 | 2 | 1 |
A | 2013 | 0 | 2 |
B | 2010 | 0 | 0 |
B | 2011 | 2 | 0 |
B | 2012 | 2 | 2 |
B | 2013 | 1 | 2 |
I know how to get to the "Count" variable using Group by, however getting the "Count from Previous Year" is where I'm having difficulty.
And then ultimately I want to use the "Count from previous year" variable and add it on to the original table by facility and year, so:
Worker ID | Facility | Year | Count from Previous Year |
1 | A | 2010 | 0 |
2 | A | 2010 | 0 |
3 | A | 2011 | 2 |
4 | A | 2012 | 1 |
5 | A | 2012 | 1 |
6 | B | 2011 | 0 |
7 | B | 2011 | 0 |
8 | B | 2012 | 2 |
9 | B | 2012 | 2 |
10 | B | 2013 | 2 |
I'm open to any and all suggestions! Thank you all in advance.
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
One way to create the table with both year's counts, assuming you did the summary to get the count into a data set named CountSummary is to join that result to itself using the Year with an offset. The Coalesce function returns the first non-missing value as a result. So when you have a 'previous year' without a count the b.count would be missing and should get 0 as the result.
proc sql; create table want as select a.* ,coalesce(b.count,0) as CountPrevYear from countsummary as a left join countsummary as b on a.facility=b.facility and a.year= (b.year+1) ; quit;
But you don't need the intermediate table. You could join the CountSummary to the other data using the same year offset approach.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
One way to create the table with both year's counts, assuming you did the summary to get the count into a data set named CountSummary is to join that result to itself using the Year with an offset. The Coalesce function returns the first non-missing value as a result. So when you have a 'previous year' without a count the b.count would be missing and should get 0 as the result.
proc sql; create table want as select a.* ,coalesce(b.count,0) as CountPrevYear from countsummary as a left join countsummary as b on a.facility=b.facility and a.year= (b.year+1) ; quit;
But you don't need the intermediate table. You could join the CountSummary to the other data using the same year offset approach.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
data have;
infile cards expandtabs;
input WorkerID Facility $ Year;
cards;
1 A 2010
2 A 2010
3 A 2011
4 A 2012
5 A 2012
6 B 2011
7 B 2011
8 B 2012
9 B 2012
10 B 2013
;
proc sql;
create table want as
select *,
(select count(*) from have
where Facility=a.Facility and Year=a.Year-1) as CountfromPreviousYear
from have as a ;
quit;