DATA Step, Macro, Functions and more

Concatenate rows by multiple by group variables

Accepted Solution Solved
Reply
Frequent Contributor
Frequent Contributor
Posts: 134
Accepted Solution

Concatenate rows by multiple by group variables

Hello,

I am trying to concatenate a variable (VAR2) based on multiple by group variables (By ID VAR1 TruncVar1) using SAS 9.4. I want to concatenate the cells that have the same TruncVar1 value. I want all the rows with the same TruncVar and ID to have the same Var2Combined value. Can anyone help ?

 

 

I want to end up with a new column in my dataset, Var2Combined, below:

 

IDVar1TruncVar1Var2Var2Combined
14356611A - B C
1435661.11B CA - B C
14356622X YX Y - C D E - A B C
1435662.012C D EX Y - C D E - A B C
1435662.022A B CX Y - C D E - A B C
15837811AA - B C
1583781.21B CA - B C
15837844A B CA B C - X Y
1583784.014X YA B C - X Y

 

 

Data sample ;
INFILE DATALINES DLM='|';
input ID Var1 TruncVar1 Var2 $;
datalines;
143566|1|1|A
143566|1.1|1|B C
143566|2|2|X Y
143566|2.01|2|C D E
143566|2.02|2|A B C
158378|1|1|A
158378|1.2|1|B C
158378|4|4|A B C
158378|4.01|4|X Y ;
run;

proc print data=sample;
run;

 

proc sort data=sample ;
by id var1 truncVar1 ;
run;

 

Data sample ;
format CombinedVar2 $200. ;
set sample ;
by id var1 truncVar1 ;

if first.truncVar1 then CombinedVar2 = Var2 ;
CombinedVar2 = Catx(" - ",CombinedVar2, Var2) ;
run;
proc print data=sample;
run;

 

 


Accepted Solutions
Solution
‎07-11-2016 07:05 PM
Super User
Posts: 11,343

Re: Concatenate rows by multiple by group variables


CP2 wrote:

Thanks for your comment. TruncVar1 is numeric, =floor(Var1) . After truncating I want to roll up alpha field (Var2) to have one concatenated value for each truncated value. I have left Var1 in the by group as : by id var1 truncVar1 


Leaving VAR1 in the By statement with that order will create separate by groups for 2&2, 2.01&2 and 2.02&2. You should drop Var1 in the by statements to prevent extra grouping.

View solution in original post


All Replies
Super User
Posts: 5,516

Re: Concatenate rows by multiple by group variables

The trick is to go through the data twice:

 

data want;

length CombinedVar2 $200 ;

do until (last.truncVar1);

   set have;
   by id var1 truncVar1

   CombinedVar2 = Catx(" - ",CombinedVar2, Var2) ;

end;

do until (last.truncVar1);

   set have;

   by id var1 truncVar1;

   output;

end;
run;

 

The first loop creates CombinedVar2.  Then the second loop reads, the same observations and outputs them.

Frequent Contributor
Frequent Contributor
Posts: 134

Re: Concatenate rows by multiple by group variables

Posted in reply to Astounding

Thanks for the suggestions but it's not giving me the desired resutls. Any other suggestions? Thanks!

Super User
Posts: 5,516

Re: Concatenate rows by multiple by group variables

Two very important suggestions.

 

(1) Post the log from your program

 

(2) Describe what you mean by "not giving you the desired results"?  Errors?  No errors, but not what you hoped to get?

Super User
Posts: 11,343

Re: Concatenate rows by multiple by group variables

Posted in reply to Astounding

Apparently there is a need to adjust the values of VAR1 as the grouping in the example has Var1=2, 2.01 and 2.02 treated in the same group. So a more complex BY statement will be required or addition of a temporary variable such as a truncated or rounded version of Var1 (not obvious if Var1 is numeric or character).

 

And since there is one example provided there may be even more obnoxious possibilities.

Frequent Contributor
Frequent Contributor
Posts: 134

Re: Concatenate rows by multiple by group variables

Thanks for your comment. TruncVar1 is numeric, =floor(Var1) . After truncating I want to roll up alpha field (Var2) to have one concatenated value for each truncated value. I have left Var1 in the by group as : by id var1 truncVar1 

Solution
‎07-11-2016 07:05 PM
Super User
Posts: 11,343

Re: Concatenate rows by multiple by group variables


CP2 wrote:

Thanks for your comment. TruncVar1 is numeric, =floor(Var1) . After truncating I want to roll up alpha field (Var2) to have one concatenated value for each truncated value. I have left Var1 in the by group as : by id var1 truncVar1 


Leaving VAR1 in the By statement with that order will create separate by groups for 2&2, 2.01&2 and 2.02&2. You should drop Var1 in the by statements to prevent extra grouping.

Frequent Contributor
Frequent Contributor
Posts: 134

Re: Concatenate rows by multiple by group variables

Posted in reply to Astounding

Thank you. CombinedVar2 is identical to Var2. I added a missing semi-colon to your code and changed 'Have' to 'Sample' as the name of my dataset. The code, output, and log follows:

 

 

Data sample ;
INFILE DATALINES DLM='|';
input ID Var1 TruncVar1 Var2 $;
datalines;
143566|1|1|A
143566|1.1|1|B C
143566|2|2|X Y
143566|2.01|2|C D E
143566|2.02|2|A B C
158378|1|1|A
158378|1.2|1|B C
158378|4|4|A B C
158378|4.01|4|X Y
;
run;
data want;
length CombinedVar2 $200 ;
do until (last.truncVar1);
set sample;
by id var1 truncVar1 ;
CombinedVar2 = Catx(" - ",CombinedVar2, Var2) ;
end;

do until (last.truncVar1);
set sample;
by id var1 truncVar1;
output;
end;
run;

proc print data=want;
run;

 

SAS Output

The SAS System

Obs CombinedVar2 ID Var1 TruncVar1 Var2 1 2 3 4 5 6 7 8 9
A1435661.001A
B C1435661.101B C
X Y1435662.002X Y
C D E1435662.012C D E
A B C1435662.022A B C
A1583781.001A
B C1583781.201B C
A B C1583784.004A B C
X Y1583784.014X Y

 

Log:


3862 Data sample ;
3863 INFILE DATALINES DLM='|';
3864 input ID Var1 TruncVar1 Var2 $;
3865 datalines;

NOTE: The data set WORK.SAMPLE has 9 observations and 4 variables.
NOTE: DATA statement used (Total process time):
real time 0.05 seconds
cpu time 0.04 seconds


3875 ;
3876 run;
3877 data want;
3878 length CombinedVar2 $200 ;
3879 do until (last.truncVar1);
3880 set sample;
3881 by id var1 truncVar1 ;
3882 CombinedVar2 = Catx(" - ",CombinedVar2, Var2) ;
3883 end;
3884
3885 do until (last.truncVar1);
3886 set sample;
3887 by id var1 truncVar1;
3888 output;
3889 end;
3890 run;

NOTE: There were 9 observations read from the data set WORK.SAMPLE.
NOTE: There were 9 observations read from the data set WORK.SAMPLE.
NOTE: The data set WORK.WANT has 9 observations and 5 variables.
NOTE: DATA statement used (Total process time):
real time 0.08 seconds
cpu time 0.07 seconds


3891
3892 proc print data=want;
3893 run;

NOTE: There were 9 observations read from the data set WORK.WANT.
NOTE: PROCEDURE PRINT used (Total process time):
real time 0.06 seconds
cpu time 0.06 seconds

 

Super User
Posts: 5,516

Re: Concatenate rows by multiple by group variables

Did you try removing VAR1 from both BY statements?  That looks like a good suggestion.

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 8 replies
  • 702 views
  • 0 likes
  • 3 in conversation