I have a dataset where each participant received three tests, and the results of each test are in individual rows such that there are three rows for each participant. I want to collapse the rows so that each participant is represented by only one row. The data looks like this:
ID | Test1 | Test2 | Test3 |
V04001 | 0 | ||
V04001 | 1 | ||
V04001 | 0 | ||
V04002 | 0 | ||
V04002 | 0 | ||
V04002 | 0 | ||
V04003 | 0 | ||
V04003 | 1 | ||
V04003 | 1 |
And this is what I want:
ID | Test1 | Test2 | Test3 |
V04001 | 0 | 1 | 0 |
V04002 | 0 | 0 | 0 |
V04003 | 1 | 1 | 0 |
Most of the searching I've done for a solution ends up with PROC TRANSPOSE, but I can't seem to figure out how to make that work with what I need. Any help would be much appreciated!
data have;
input ID $ Test1 Test2 Test3;
cards;
V04001 . . 0
V04001 . 1 .
V04001 0 . .
V04002 . . 0
V04002 . 0 .
V04002 0 . .
V04003 . . 0
V04003 . 1 .
V04003 1 . .
;
data want;
update have(obs=0) have;
by id;
run;
data have;
input ID $ Test1 Test2 Test3;
cards;
V04001 . . 0
V04001 . 1 .
V04001 0 . .
V04002 . . 0
V04002 . 0 .
V04002 0 . .
V04003 . . 0
V04003 . 1 .
V04003 1 . .
;
data want;
update have(obs=0) have;
by id;
run;
Hi @lh50
Please try this
proc sql;
create table want as
select id, sum(test1) as test1,
sum(test2) as test2,
sum(test3) as test3
from have
group by id;
quit;
If your TEST variables are all numeric then an approach with proc summary/means and the MAX function may be appropriate:
Proc summary data = have nway; class id; var test1 test2 test3; output out=want (drop=_type_ _freq_) sum =; run;
If the values are not numeric this will not work as VAR variables in summary must be numeric.
Other considerations arise if there are other variables in your data as well. Which values should be kept in the "collapsing" process would need to be specified to provide a different solution as the above will remove any other variables.
That is very easy, using the UPDATE statement:
data have;
input ID $ test1-test3;
cards;
V04001 . . 0
V04001 . 1 .
V04001 0 . .
V04002 . . 0
V04002 . 0 .
V04002 0 . .
V04003 . . 0
V04003 . 1 .
V04003 1 . .
;run;
data want;
update have(obs=0) have;
by id;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.