Hello all, I have the following data format. Each subject could be rated in either a Y or R category.
Subject | Rater | Categorization |
---|---|---|
1 | 1 | Y |
1 | 2 | Y |
1 | 3 | R |
2 | 1 | R |
2 | 2 | R |
3 | 1 | Y |
3 | 2 | Y |
4 | 1 | Y |
4 | 2 | |
4 | 3 | R |
I would like to aggregate the data to the following count format:
Subject | Cat_Y | Cat_R |
---|---|---|
1 | 2 | 1 |
2 | 0 | 2 |
3 | 2 | 0 |
4 | 1 | 1 |
I am not sure I the best way would be to do this via a datastep or some proc (e.g means, freq, etc). In total I have ratings for about 60 subjects so I would like to automate the process as much as possible. Any suggestions?
Thanks!
I, too, think that sql would be the easiest. However, if you prefer a datastep and your data are already sorted by subject, then you could use:
data want (keep=subject cat_:);
set have;
by subject;
if first.subject then do;
cat_y=0;
cat_r=0;
end;
if categorization eq 'R' then cat_r+1;
else if categorization eq 'Y' then cat_y+1;
if last.subject then output;
run;
proc sql;
create table want as
select subject, sum(Categorization='R') as cat_r,
sum(Categorization='Y') as cat_y
from have
group by subject;
quit;
I, too, think that sql would be the easiest. However, if you prefer a datastep and your data are already sorted by subject, then you could use:
data want (keep=subject cat_:);
set have;
by subject;
if first.subject then do;
cat_y=0;
cat_r=0;
end;
if categorization eq 'R' then cat_r+1;
else if categorization eq 'Y' then cat_y+1;
if last.subject then output;
run;
Thank you Art and Hai.kuo.
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.