DATA Step, Macro, Functions and more

Combining the same variable on different records into a single variable

Accepted Solution Solved
Reply
Occasional Contributor kdm
Occasional Contributor
Posts: 11
Accepted Solution

Combining the same variable on different records into a single variable


Hi,

I have a requirement where I have to combine the data present in the same variable on different datasets into a single variable. The format of the variables is character. The records are linked by a particular id. So for a particular id if we have 5 rows and each of the five rows contain a char variable called LINe , then we need to produce one record which contains the two variables id and LINE. Here LINE is a combination of the LINE variable from the five records which are linked by the id. Given below is an example of the data.

100  abcdefghijkollldsdksjd

100 djsjdklsjdksdlsldklskdsld;sk;dks;

100 djskdjsldjlsdjlsdjlsdljskdjskdjskdioeioeoeoe

100 jksldksldklsldklsdksldksldklskdls

100 djsldksldksldklsdk;d;;adskldkasldklsdksdkl;skdsk

200 djksldklskdlskdls\

200 uuuuuuuuuuuuuuuuuu

Output required

100 abcdefghijkollldsdksjddjsjdklsjdksdlsldklskdsld;sk;dks;djskdjsldjlsdjlsdjlsdljskdjskdjskdioeioeoeoejksldksldklsldklsdksldksldklskdlsdjsldksldksldklsdk;d;;adskldkasldklsdksdkl;skdsk

200 djksldklskdlskdls\uuuuuuuuuuuuuuuuuu

Thanks


Accepted Solutions
Solution
‎05-11-2012 11:38 AM
Super Contributor
Posts: 1,636

Re: Combining the same variable on different records into a single variable

Posted in reply to Astounding

Hi Astounding,

there are two problems in my first post.

one is the space between "new line";

another is

"if last.id then output;call missing(newline);run;"

it should be

"if last.id then do;output;call missing(newline);end;run;"

Thanks - Linlin

View solution in original post


All Replies
Super Contributor
Posts: 1,636

Re: Combining the same variable on different records into a single variable

/* this one doesn't work */

try:

data have;

input id line$ 50.;

cards;

100  abcdefghijkollldsdksjd

100 djsjdklsjdksdlsldklskdsld

100 djskdjsldjlsdjlsdjlsdljskdjskdjskdioeioeoeoe

100 jksldksldklsldklsdksldksldklskdls

100 djsldksldksldklsdkdadskldkasldklsdksdklskdsk

200 djksldklskdlskdls

200 uuuuuuuuuuuuuuuuuu

;

data want;

length newline $250;

retain new line;

set have;

by id notsorted;

newline=cats(newline,line);

if last.id then output;

call missing(newline);

run;

proc print;run;

Message was edited by: Linlin

Occasional Contributor kdm
Occasional Contributor
Posts: 11

Re: Combining the same variable on different records into a single variable

Hi,

I tried what LiniLin suggested, but newline gets a length of the first line and hence the rest of the items get truncated or are not concatenated.

Thanks,

Kurian

Super User
Posts: 5,518

Re: Combining the same variable on different records into a single variable

Hmmm... tried to update my solution but it's not being accepted.

I reread the problem, and it seems I have the wrong SET statement.  Since you have only one source of data, the SET statement should read:

set have;

Looking at the trouble you had with Linlin's solution, it may need a longer length for NEWLINE.  And the RETAIN statement has an extra blank between NEW and LINE.  But the idea is sound.

Solution
‎05-11-2012 11:38 AM
Super Contributor
Posts: 1,636

Re: Combining the same variable on different records into a single variable

Posted in reply to Astounding

Hi Astounding,

there are two problems in my first post.

one is the space between "new line";

another is

"if last.id then output;call missing(newline);run;"

it should be

"if last.id then do;output;call missing(newline);end;run;"

Thanks - Linlin

Super Contributor
Posts: 1,636

Re: Combining the same variable on different records into a single variable

Sorry.:smileysilly:

try this:

data have;

input id line$ 50.;

cards;

100  abcdefghijkollldsdksjd

100 djsjdklsjdksdlsldklskdsld

100 djskdjsldjlsdjlsdjlsdljskdjskdjskdioeioeoeoe

100 jksldksldklsldklsdksldksldklskdls

100 djsldksldksldklsdkdadskldkasldklsdksdklskdsk

200 djksldklskdlskdls

200 uuuuuuuuuuuuuuuuuu

;

data want(keep=newline);

length newline $250;

retain newline;

set have;

by id ;

newline=cats(newline,line);

if last.id then do;output ;

call missing(newline); end;

run;

proc print;run;

Obs                      newline

1  abcdefghijkollldsdksjddjsjdklsjdksdlsldklskdslddjskdjsldjlsdjlsdjlsdljskdjskdjs

2  djksldklskdlskdlsuuuuuuuuuuuuuuuuuu

Super User
Posts: 10,047

Re: Combining the same variable on different records into a single variable

I think you need firstly merge them all into one table ,then concatenate them all together.

data want;

merge a1(rename=(line=line_1)) a2(rename=(line=line_2)) a3(rename=(line=line_3));

by id;

length line $2000 ;

line=cats(of line_: );

drop line_:  ;

run;

That is a good start.

Ksharp

Super User
Posts: 5,518

Re: Combining the same variable on different records into a single variable

Another variation:

data want;

   length bigline $ 2000;

   retain bigline;

   set have1 have2 have3;

   by id;

   if first.id then bigline=line;

   else bigline = trim(bigline) || line;

   if last.id;

run;

Note that the CATS function could, in rare cases, produce a slightly different result.  If one of your LINE values contains leading blanks, they would be removed.  Perhaps that's even desirable, but that is a choice you should be aware of.

Good luck.

Valued Guide
Posts: 765

Re: Combining the same variable on different records into a single variable

hi ... here's another idea (no RETAIN statement, no CAT function)

you'll have to guess at the length of NEWVAR (without that LENGTH statement, NEWVAR's length will be 32K)

data have;

input id line $50.;

cards;

100 abcdefghijkollldsdksjd

100 djsjdklsjdksdlsldklskdsld

100 djskdjsldjlsdjlsdjlsdljskdjskdjskdioeioeoeoe

100 jksldksldklsldklsdksldksldklskdls

100 djsldksldksldklsdkdadskldkasldklsdksdklskdsk

200 djksldklskdlskdls

200 uuuuuuuuuuuuuuuuuu

;

filename x dummy;

data want (drop=line);

length newvar $200.;

do until (last.id);

  set have;

  by id;

  file x;

  put line +(-1) @;

end;

newvar = _file_;

put;

run;

Super User
Super User
Posts: 7,083

Re: Combining the same variable on different records into a single variable

Or just use the CATS() function instead of playing tricks with output file.

data want ;

  length newvar $200;

  do until (last.id);

    set have;

    by id;

    newvar=cats(newvar,line);

  end;

  drop line;

run;

Frequently I will also add this line before the LENGTH statement to force the new variable be after the original variables.

if 0 then set have;

Valued Guide
Posts: 765

Re: Combining the same variable on different records into a single variable

hi ... sure, CATS works (and in this case, better) ... not sure if _file_ is a trick or just showing another way to concatenate values

Occasional Contributor kdm
Occasional Contributor
Posts: 11

Re: Combining the same variable on different records into a single variable

Thanks for the wonderul solution ppl!!!

I went with Linlin's suggestion

I had one additional question. In the above problem how can we have newline to be dynamic?

As in if we do not know the maximum number of rows that can have the same id. Does SAS support dynamic variables.

We have scenarios where newline can be 100, 350 or 500 and could be even more.

Valued Guide
Posts: 765

Re: Combining the same variable on different records into a single variable

hi ... use Linlin's data and Tom's data step, insert PROC SQL in between to find the maximum length of the combined data across the IDs

data have;

input id line $50.;

cards;

100 abcdefghijkollldsdksjd

100 djsjdklsjdksdlsldklskdsld

100 djskdjsldjlsdjlsdjlsdljskdjskdjskdioeioeoeoe

100 jksldksldklsldklsdksldksldklskdls

100 djsldksldksldklsdkdadskldkasldklsdksdklskdsk

200 djksldklskdlskdls

200 uuuuuuuuuuuuuuuuuu

;

proc sql noprint;

select max(lng) into :maxlng separated by ' ' from (select sum(length(line)) as lng from have group by id) ;

quit;

data want ;

  length newvar $&maxlng;

  do until (last.id);

    set have;

    by id;

    newvar=cats(newvar,line);

  end;

  drop line;

run;


ps the "separated by" removes any blanks from around the value of the  macro variable ...


http://www.sascommunity.org/wiki/TipsSmiley Frustratedtrip_Blanks_from_PROC_SQL-Created_Macro_Variable_Values

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 12 replies
  • 639 views
  • 4 likes
  • 6 in conversation