BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
jimbobob
Quartz | Level 8

I know there is data steps/arrays that will combine multiple rows into one row, I want to do that and keep the original table layout, and for the total column keep max value. For example;

Original Table:

Key ID NAME TOTAL
A 262P00050013 Bob 15012.32
A 262P00050014 Joe 24484.12
A 262P00050015 Sally 18988.43

New Table:

Key ID NAME TOTAL ID_COMBINED NAME_COMBINED Max
A 262P00050013 Bob 15012.32 262P00050013, 262P00050014, 262P00050015 Bob, Joe, Sally 24484.12
A 262P00050014 Joe 24484.12 262P00050013, 262P00050014, 262P00050015 Bob, Joe, Sally 24484.12
A 262P00050015 Sally 18988.43 262P00050013, 262P00050014, 262P00050015 Bob, Joe, Sally 24484.12

 

Any help would be appreciated.

1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

Yes, you have to read all obs for each key twice, once to build the MAX and COMBINED variables, and again to re-read (and output) each obs with the new variables added:

 

data have;
  input Key	$1. ID :$12.	NAME :$6.	TOTAL ;
datalines;
A	262P00050013	Bob	15012.32
A	262P00050014	Joe	24484.12
A	262P00050015	Sally	18988.43
run;

data want;
  if 0 then set have;  /*Insures that original vars in HAVE will be the leftmost vars in WANT*/

  /* Now read all obs for a key create the new variables for it */
  length id_combined $80 name_combined $60 ;
  do until (last.key);
    set have;
    by key;
    id_combined=catx(',',id_combined,id);
    name_combined=catx(',',name_combined,name);
    max=max(total,max);
  end;

  /* With new vars in hand, re-read old vars for the key and output each obs */
  do until (last.key);
    set have;
    by key;
    output;
  end;
run;

    

 

As @Reeza said, you have to read each key twice (the data are sorted by key, right?).   But this approach does not create and write out a separate table/dataset with the combined variables.  So if your dataset is large, you'll avoid writing out an intermediate dataset only to read it back in from disk for merging purposes.   The complexity of the above does have its benefits.

 

You will need to know the lengths needed for id_combined and name_combined.  For example, if the maximum number of observations possible for a key is (say) 10, then the length needed for ID_COMBINED would be 10*12 (each id is $12) plus 9*1  (9 commas) ==> $129 minimum length.  Same concept for name_combined.  

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

4 REPLIES 4
Reeza
Super User

Create the ID combined in a separate table  and merge it back with the main table. 

A DoW loop is the only other way to do this but it’s also still technically two passes of the data so same amount of work as the merge and the merge is infinitely easier to understand. 

jimbobob
Quartz | Level 8

Thanks Reeza.

mkeintz
PROC Star

Yes, you have to read all obs for each key twice, once to build the MAX and COMBINED variables, and again to re-read (and output) each obs with the new variables added:

 

data have;
  input Key	$1. ID :$12.	NAME :$6.	TOTAL ;
datalines;
A	262P00050013	Bob	15012.32
A	262P00050014	Joe	24484.12
A	262P00050015	Sally	18988.43
run;

data want;
  if 0 then set have;  /*Insures that original vars in HAVE will be the leftmost vars in WANT*/

  /* Now read all obs for a key create the new variables for it */
  length id_combined $80 name_combined $60 ;
  do until (last.key);
    set have;
    by key;
    id_combined=catx(',',id_combined,id);
    name_combined=catx(',',name_combined,name);
    max=max(total,max);
  end;

  /* With new vars in hand, re-read old vars for the key and output each obs */
  do until (last.key);
    set have;
    by key;
    output;
  end;
run;

    

 

As @Reeza said, you have to read each key twice (the data are sorted by key, right?).   But this approach does not create and write out a separate table/dataset with the combined variables.  So if your dataset is large, you'll avoid writing out an intermediate dataset only to read it back in from disk for merging purposes.   The complexity of the above does have its benefits.

 

You will need to know the lengths needed for id_combined and name_combined.  For example, if the maximum number of observations possible for a key is (say) 10, then the length needed for ID_COMBINED would be 10*12 (each id is $12) plus 9*1  (9 commas) ==> $129 minimum length.  Same concept for name_combined.  

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
jimbobob
Quartz | Level 8

Thanks Mkeintz, this works great.

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 1626 views
  • 3 likes
  • 3 in conversation