My data comprises multiple IDs and grades for those IDs in multiple terms. I want to condense the dataset so that each unique ID has only one line while all of the columns retain the full grade information.
It currently looks like this:
ID | SP20 | SM20 | FA20 | SP21 | SM21 | FA21 | SP22 |
1001 | A+ | ||||||
1002 | D | ||||||
1002 | F | ||||||
1003 | B | ||||||
1004 | A+ | ||||||
1005 | B- | ||||||
1006 | A+ | ||||||
1007 | F | ||||||
1007 | B+ |
Please note that IDs 1002 and 1007 each have two lines because they took the course twice, i.e., in two different semesters.
I want it to become this:
ID | SP20 | SM20 | FA20 | SP21 | SM21 | FA21 | SP22 |
1001 | A+ | ||||||
1002 | D | F | |||||
1003 | B | ||||||
1004 | A+ | ||||||
1005 | B- | ||||||
1006 | A+ | ||||||
1007 | F | B+ |
Please note that IDs 1002 and 1007 are now on only one line while the grades from each semester are now included.
Attached is a csv of the same data above.
Something like this should work:
proc sql; create table UniqueId as select distinct Id from yourdatasetname ; quit; proc sort data=yourdatasetname; by id; run; data want; update uniqueid yourdatasetname; by id; run;
Note that if the "grades" were numeric values this could be done with proc summary in one step.
It is best to provide data in the form of a working data step. If I have to make a data set from another format I may may things that don't actually match your variable types and may not have the same variable names.
Something like this should work:
proc sql; create table UniqueId as select distinct Id from yourdatasetname ; quit; proc sort data=yourdatasetname; by id; run; data want; update uniqueid yourdatasetname; by id; run;
Note that if the "grades" were numeric values this could be done with proc summary in one step.
It is best to provide data in the form of a working data step. If I have to make a data set from another format I may may things that don't actually match your variable types and may not have the same variable names.
You don't need the separate dataset. Just use the one dataset for both of the inputs that the UPDATE statement needs. Use the OBS=0 dataset option to start with 0 observations in the original dataset and treat all of the observations as transactions to it.
proc sort data=yourdatasetname;
by id;
run;
data want;
update yourdatasetname(obs=0) yourdatasetname;
by id;
run;
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.