DATA Step, Macro, Functions and more

Create a concatenated string based on multiple observations

Reply
Occasional Contributor
Posts: 12

Create a concatenated string based on multiple observations

Hi all,

I would like to create a string variable that concactenates a variable from a multi-observation database into a longer string (left to right) then outputs this to a seperate single observation database. Example:

 

ID            StringID

0001        ZB012

0001        XC045

0001        CV456

0001        

0002        AS345

0002         

0002        DE458

0003        RT784

0004        SD789

0004        PI066

0005

0006        TT545

0006        DS410

etc...

 

Output database:

ID            longStringID

0001        ZB012, XC045, CV456

0002        AS345, DE458

0003        RT784

0004        SD789, PI066

0005

0006        TT545, DS410

 

Note: The ID with missing String ID should have an output observation. Thanks community.

I'm using SAS 9.4

--Tim 

Super User
Posts: 13,321

Re: Create a concatenated string based on multiple observations

Do you know the maximum length possible for all combined strings for a given ID? The example below assumes a maximum of 100 characters. If you do not assign a length to longstringid the catx function will default to 200. If you need more than 200 definitely set a length.

 

data have;
infile datalines missover;
input ID  $          StringID $;
datalines;
0001        ZB012
0001        XC045
0001        CV456
0001        
0002        AS345
0002         
0002        DE458
0003        RT784
0004        SD789
0004        PI066
0005           
0006        TT545
0006        DS410
;
run;

data want;
   set have;
   by id;
   length longstringid $ 100.;
   retain longstringid ;
   if first.id then call missing(longstringid);
   if not missing(stringId) then longstringid =catx(', ',longstringid,stringid);
   if last.id;
run;
Super User
Posts: 10,689

Re: Create a concatenated string based on multiple observations

data have;
infile datalines missover;
input ID  $          StringID $;
datalines;
0001        ZB012
0001        XC045
0001        CV456
0001        
0002        AS345
0002         
0002        DE458
0003        RT784
0004        SD789
0004        PI066
0005           
0006        TT545
0006        DS410
;
run;
data want;
 length want $ 200;
 do until(last.id);
  set have;
  by id;
  want=catx(',',want,StringID);
 end;
run;
PROC Star
Posts: 253

Re: Create a concatenated string based on multiple observations

Ksharp,

In cases like this (repeatedly adding to the same string) you will get better performance by using CALL CATX than from using the CATX function:

data want;
 length want $ 200;
 do until(last.id);
  set have;
  by id;
  call catx(',',want,StringID);
 end;
run;
Ask a Question
Discussion stats
  • 3 replies
  • 127 views
  • 0 likes
  • 4 in conversation