I know there is data steps/arrays that will combine multiple rows into one row, I want to do that and keep the original table layout, and for the total column keep max value. For example;
Original Table:
Key | ID | NAME | TOTAL |
A | 262P00050013 | Bob | 15012.32 |
A | 262P00050014 | Joe | 24484.12 |
A | 262P00050015 | Sally | 18988.43 |
New Table:
Key | ID | NAME | TOTAL | ID_COMBINED | NAME_COMBINED | Max |
A | 262P00050013 | Bob | 15012.32 | 262P00050013, 262P00050014, 262P00050015 | Bob, Joe, Sally | 24484.12 |
A | 262P00050014 | Joe | 24484.12 | 262P00050013, 262P00050014, 262P00050015 | Bob, Joe, Sally | 24484.12 |
A | 262P00050015 | Sally | 18988.43 | 262P00050013, 262P00050014, 262P00050015 | Bob, Joe, Sally | 24484.12 |
Any help would be appreciated.
Yes, you have to read all obs for each key twice, once to build the MAX and COMBINED variables, and again to re-read (and output) each obs with the new variables added:
data have;
input Key $1. ID :$12. NAME :$6. TOTAL ;
datalines;
A 262P00050013 Bob 15012.32
A 262P00050014 Joe 24484.12
A 262P00050015 Sally 18988.43
run;
data want;
if 0 then set have; /*Insures that original vars in HAVE will be the leftmost vars in WANT*/
/* Now read all obs for a key create the new variables for it */
length id_combined $80 name_combined $60 ;
do until (last.key);
set have;
by key;
id_combined=catx(',',id_combined,id);
name_combined=catx(',',name_combined,name);
max=max(total,max);
end;
/* With new vars in hand, re-read old vars for the key and output each obs */
do until (last.key);
set have;
by key;
output;
end;
run;
As @Reeza said, you have to read each key twice (the data are sorted by key, right?). But this approach does not create and write out a separate table/dataset with the combined variables. So if your dataset is large, you'll avoid writing out an intermediate dataset only to read it back in from disk for merging purposes. The complexity of the above does have its benefits.
You will need to know the lengths needed for id_combined and name_combined. For example, if the maximum number of observations possible for a key is (say) 10, then the length needed for ID_COMBINED would be 10*12 (each id is $12) plus 9*1 (9 commas) ==> $129 minimum length. Same concept for name_combined.
Create the ID combined in a separate table and merge it back with the main table.
A DoW loop is the only other way to do this but it’s also still technically two passes of the data so same amount of work as the merge and the merge is infinitely easier to understand.
Thanks Reeza.
Yes, you have to read all obs for each key twice, once to build the MAX and COMBINED variables, and again to re-read (and output) each obs with the new variables added:
data have;
input Key $1. ID :$12. NAME :$6. TOTAL ;
datalines;
A 262P00050013 Bob 15012.32
A 262P00050014 Joe 24484.12
A 262P00050015 Sally 18988.43
run;
data want;
if 0 then set have; /*Insures that original vars in HAVE will be the leftmost vars in WANT*/
/* Now read all obs for a key create the new variables for it */
length id_combined $80 name_combined $60 ;
do until (last.key);
set have;
by key;
id_combined=catx(',',id_combined,id);
name_combined=catx(',',name_combined,name);
max=max(total,max);
end;
/* With new vars in hand, re-read old vars for the key and output each obs */
do until (last.key);
set have;
by key;
output;
end;
run;
As @Reeza said, you have to read each key twice (the data are sorted by key, right?). But this approach does not create and write out a separate table/dataset with the combined variables. So if your dataset is large, you'll avoid writing out an intermediate dataset only to read it back in from disk for merging purposes. The complexity of the above does have its benefits.
You will need to know the lengths needed for id_combined and name_combined. For example, if the maximum number of observations possible for a key is (say) 10, then the length needed for ID_COMBINED would be 10*12 (each id is $12) plus 9*1 (9 commas) ==> $129 minimum length. Same concept for name_combined.
Thanks Mkeintz, this works great.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.