I have the following dataset called Results_Before
Observation Fruit ID Fruit Color Shape
1 1 Banana Yellow Odd
2 2 Apple Red Circle_1
3 2 Apple Green Circle_2
4 3 Orange Orange Circle
5 4 Watermelon Green Oval
6 5 Berry Blue Odd_1
7 5 Berry Red Odd_2
8 5 Berry Purple Odd_3
9 6 Peach Peach Circle
How would you code to concatenate the columns using Data Step so that I get the output with 2 new columns below:
As you can see, when there are multiple same fruit numbers, the first color_joined and shape_joined remains the same but each row after adds the previous color and shape to color_joined and shape_joined.
Results_After
Observation Fruit_ID Fruit Color Shape Color_Joined Shape_Joined
1 1 Banana Yellow Odd Yellow Odd
2 2 Apple Red Circle_1 Red Circle_1
3 2 Apple Green Circle_2 Red, Green Circle_1, Circle_2
4 3 Orange Orange Circle Orange Circle
5 4 Watermelon Green Oval Green Oval
6 5 Berry Blue Odd_1 Blue Odd_1
7 5 Berry Red Odd_2 Blue, Red Odd_1, Odd_2
8 5 Berry Purple Odd_3 Blue, Red, Purple Odd_1, Odd_2, Odd_3
9 6 Peach Peach Circle Peach Circle
I tried only for Color to begin with, with the hope of also including Shape into the code.
The code I have so far doesn't work but here it is:
Also, not sure if Lag() function would help here?
Data Results_After; Set Results_Before; By Observation; If First.Fruit_ID then Color_Joined = Color;
Else Do; Color_Joined = catx(',', Color);
End; Run;
Proc Print;
Run;
data fruit_info;
length fruit $ 10;
input observation fruit_id fruit $ color $ shape $;
datalines;
1 1 Banana Yellow Odd
2 2 Apple Red Circle_1
3 2 Apple Green Circle_2
4 3 Orange Orange Circle
5 4 Watermelon Green Oval
6 5 Berry Blue Odd_1
7 5 Berry Red Odd_2
8 5 Berry Purple Odd_3
9 6 Peach Peach Circle
;
run;
proc print data=fruit_info noobs;
title 'Input data set';
run;
proc sort data=fruit_info;
by fruit_id;
run;
data fruit_info_retained;
length color_joined shape_joined $ 50;
retain color_joined shape_joined;
set fruit_info;
by fruit_id;
if first.fruit_id then
do;
color_joined = color;
shape_joined = shape;
end;
else
do;
color_joined = catx(', ', color_joined, color);
shape_joined = catx(', ', shape_joined, shape);
end;
run;
proc sort data=fruit_info_retained;
by observation;
run;
proc print data=fruit_info_retained noobs;
title 'Output data set';
var observation fruit_id fruit color shape color_joined shape_joined;
run;
data fruit_info;
length fruit $ 10;
input observation fruit_id fruit $ color $ shape $;
datalines;
1 1 Banana Yellow Odd
2 2 Apple Red Circle_1
3 2 Apple Green Circle_2
4 3 Orange Orange Circle
5 4 Watermelon Green Oval
6 5 Berry Blue Odd_1
7 5 Berry Red Odd_2
8 5 Berry Purple Odd_3
9 6 Peach Peach Circle
;
run;
proc print data=fruit_info noobs;
title 'Input data set';
run;
proc sort data=fruit_info;
by fruit_id;
run;
data fruit_info_retained;
length color_joined shape_joined $ 50;
retain color_joined shape_joined;
set fruit_info;
by fruit_id;
if first.fruit_id then
do;
color_joined = color;
shape_joined = shape;
end;
else
do;
color_joined = catx(', ', color_joined, color);
shape_joined = catx(', ', shape_joined, shape);
end;
run;
proc sort data=fruit_info_retained;
by observation;
run;
proc print data=fruit_info_retained noobs;
title 'Output data set';
var observation fruit_id fruit color shape color_joined shape_joined;
run;
Hi,
I am trying to concatenate all the rows from each group. I have written a code from which I am getting my desire result in last line(using last.) but problem is , it is only keeping up to 32767 characters. I am looking for a solution, if the character limit reaches 32767 then in the next variable (column) the remaining character should add. I am using the below code but it is not spiting into different variable if limit reaches. I have attached the result. Any help will be appreciated. Many thanks in advance.
data part4 (keep=DOC_NUMBER original_variable count);
set part3;
BY DOC_NUMBER;
if FIRST.DOC_NUMBER then
Count = 0;
Count + 1;
run;
data part5;
length concatenated_field $ 32767;
retain concatenated_field;
set part4;
by DOC_NUMBER;
if first.DOC_NUMBER then
do;
concatenated_field = original_variable;
end;
else
do;
concatenated_field = catx(', ', concatenated_field, original_variable);
end;
run;
Just add a retain-statement after by.
retain Color_Joined Shape_Joined;
Hello
Should retain statement be after by or before????
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.