BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
kdm
Calcite | Level 5 kdm
Calcite | Level 5


Hi,

I have a requirement where I have to combine the data present in the same variable on different datasets into a single variable. The format of the variables is character. The records are linked by a particular id. So for a particular id if we have 5 rows and each of the five rows contain a char variable called LINe , then we need to produce one record which contains the two variables id and LINE. Here LINE is a combination of the LINE variable from the five records which are linked by the id. Given below is an example of the data.

100  abcdefghijkollldsdksjd

100 djsjdklsjdksdlsldklskdsld;sk;dks;

100 djskdjsldjlsdjlsdjlsdljskdjskdjskdioeioeoeoe

100 jksldksldklsldklsdksldksldklskdls

100 djsldksldksldklsdk;d;;adskldkasldklsdksdkl;skdsk

200 djksldklskdlskdls\

200 uuuuuuuuuuuuuuuuuu

Output required

100 abcdefghijkollldsdksjddjsjdklsjdksdlsldklskdsld;sk;dks;djskdjsldjlsdjlsdjlsdljskdjskdjskdioeioeoeoejksldksldklsldklsdksldksldklskdlsdjsldksldksldklsdk;d;;adskldkasldklsdksdkl;skdsk

200 djksldklskdlskdls\uuuuuuuuuuuuuuuuuu

Thanks

1 ACCEPTED SOLUTION

Accepted Solutions
Linlin
Lapis Lazuli | Level 10

Hi Astounding,

there are two problems in my first post.

one is the space between "new line";

another is

"if last.id then output;call missing(newline);run;"

it should be

"if last.id then do;output;call missing(newline);end;run;"

Thanks - Linlin

View solution in original post

12 REPLIES 12
Linlin
Lapis Lazuli | Level 10

/* this one doesn't work */

try:

data have;

input id line$ 50.;

cards;

100  abcdefghijkollldsdksjd

100 djsjdklsjdksdlsldklskdsld

100 djskdjsldjlsdjlsdjlsdljskdjskdjskdioeioeoeoe

100 jksldksldklsldklsdksldksldklskdls

100 djsldksldksldklsdkdadskldkasldklsdksdklskdsk

200 djksldklskdlskdls

200 uuuuuuuuuuuuuuuuuu

;

data want;

length newline $250;

retain new line;

set have;

by id notsorted;

newline=cats(newline,line);

if last.id then output;

call missing(newline);

run;

proc print;run;

Message was edited by: Linlin

kdm
Calcite | Level 5 kdm
Calcite | Level 5

Hi,

I tried what LiniLin suggested, but newline gets a length of the first line and hence the rest of the items get truncated or are not concatenated.

Thanks,

Kurian

Astounding
PROC Star

Hmmm... tried to update my solution but it's not being accepted.

I reread the problem, and it seems I have the wrong SET statement.  Since you have only one source of data, the SET statement should read:

set have;

Looking at the trouble you had with Linlin's solution, it may need a longer length for NEWLINE.  And the RETAIN statement has an extra blank between NEW and LINE.  But the idea is sound.

Linlin
Lapis Lazuli | Level 10

Hi Astounding,

there are two problems in my first post.

one is the space between "new line";

another is

"if last.id then output;call missing(newline);run;"

it should be

"if last.id then do;output;call missing(newline);end;run;"

Thanks - Linlin

Linlin
Lapis Lazuli | Level 10

Sorry.:smileysilly:

try this:

data have;

input id line$ 50.;

cards;

100  abcdefghijkollldsdksjd

100 djsjdklsjdksdlsldklskdsld

100 djskdjsldjlsdjlsdjlsdljskdjskdjskdioeioeoeoe

100 jksldksldklsldklsdksldksldklskdls

100 djsldksldksldklsdkdadskldkasldklsdksdklskdsk

200 djksldklskdlskdls

200 uuuuuuuuuuuuuuuuuu

;

data want(keep=newline);

length newline $250;

retain newline;

set have;

by id ;

newline=cats(newline,line);

if last.id then do;output ;

call missing(newline); end;

run;

proc print;run;

Obs                      newline

1  abcdefghijkollldsdksjddjsjdklsjdksdlsldklskdslddjskdjsldjlsdjlsdjlsdljskdjskdjs

2  djksldklskdlskdlsuuuuuuuuuuuuuuuuuu

Ksharp
Super User

I think you need firstly merge them all into one table ,then concatenate them all together.

data want;

merge a1(rename=(line=line_1)) a2(rename=(line=line_2)) a3(rename=(line=line_3));

by id;

length line $2000 ;

line=cats(of line_: );

drop line_:  ;

run;

That is a good start.

Ksharp

Astounding
PROC Star

Another variation:

data want;

   length bigline $ 2000;

   retain bigline;

   set have1 have2 have3;

   by id;

   if first.id then bigline=line;

   else bigline = trim(bigline) || line;

   if last.id;

run;

Note that the CATS function could, in rare cases, produce a slightly different result.  If one of your LINE values contains leading blanks, they would be removed.  Perhaps that's even desirable, but that is a choice you should be aware of.

Good luck.

MikeZdeb
Rhodochrosite | Level 12

hi ... here's another idea (no RETAIN statement, no CAT function)

you'll have to guess at the length of NEWVAR (without that LENGTH statement, NEWVAR's length will be 32K)

data have;

input id line $50.;

cards;

100 abcdefghijkollldsdksjd

100 djsjdklsjdksdlsldklskdsld

100 djskdjsldjlsdjlsdjlsdljskdjskdjskdioeioeoeoe

100 jksldksldklsldklsdksldksldklskdls

100 djsldksldksldklsdkdadskldkasldklsdksdklskdsk

200 djksldklskdlskdls

200 uuuuuuuuuuuuuuuuuu

;

filename x dummy;

data want (drop=line);

length newvar $200.;

do until (last.id);

  set have;

  by id;

  file x;

  put line +(-1) @;

end;

newvar = _file_;

put;

run;

Tom
Super User Tom
Super User

Or just use the CATS() function instead of playing tricks with output file.

data want ;

  length newvar $200;

  do until (last.id);

    set have;

    by id;

    newvar=cats(newvar,line);

  end;

  drop line;

run;

Frequently I will also add this line before the LENGTH statement to force the new variable be after the original variables.

if 0 then set have;

MikeZdeb
Rhodochrosite | Level 12

hi ... sure, CATS works (and in this case, better) ... not sure if _file_ is a trick or just showing another way to concatenate values

kdm
Calcite | Level 5 kdm
Calcite | Level 5

Thanks for the wonderul solution ppl!!!

I went with Linlin's suggestion

I had one additional question. In the above problem how can we have newline to be dynamic?

As in if we do not know the maximum number of rows that can have the same id. Does SAS support dynamic variables.

We have scenarios where newline can be 100, 350 or 500 and could be even more.

MikeZdeb
Rhodochrosite | Level 12

hi ... use Linlin's data and Tom's data step, insert PROC SQL in between to find the maximum length of the combined data across the IDs

data have;

input id line $50.;

cards;

100 abcdefghijkollldsdksjd

100 djsjdklsjdksdlsldklskdsld

100 djskdjsldjlsdjlsdjlsdljskdjskdjskdioeioeoeoe

100 jksldksldklsldklsdksldksldklskdls

100 djsldksldksldklsdkdadskldkasldklsdksdklskdsk

200 djksldklskdlskdls

200 uuuuuuuuuuuuuuuuuu

;

proc sql noprint;

select max(lng) into :maxlng separated by ' ' from (select sum(length(line)) as lng from have group by id) ;

quit;

data want ;

  length newvar $&maxlng;

  do until (last.id);

    set have;

    by id;

    newvar=cats(newvar,line);

  end;

  drop line;

run;


ps the "separated by" removes any blanks from around the value of the  macro variable ...


http://www.sascommunity.org/wiki/Tips:Strip_Blanks_from_PROC_SQL-Created_Macro_Variable_Values

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 12 replies
  • 1652 views
  • 4 likes
  • 6 in conversation