I have the following dataset called Results_Before
Observation Fruit ID Fruit Color Shape
1 1 Banana Yellow Odd
2 2 Apple Red Circle_1
3 2 Apple Green Circle_2
4 3 Orange Orange Circle
5 4 Watermelon Green Oval
6 5 Berry Blue Odd_1
7 5 Berry Red Odd_2
8 5 Berry Purple Odd_3
9 6 Peach Peach Circle
How would you code to concatenate the columns using Data Step so that I get the output with 2 new columns below:
As you can see, when there are multiple same fruit numbers, the first color_joined and shape_joined remains the same but each row after adds the previous color and shape to color_joined and shape_joined.
Results_After
Observation Fruit_ID Fruit Color Shape Color_Joined Shape_Joined
1 1 Banana Yellow Odd Yellow Odd
2 2 Apple Red Circle_1 Red Circle_1
3 2 Apple Green Circle_2 Red, Green Circle_1, Circle_2
4 3 Orange Orange Circle Orange Circle
5 4 Watermelon Green Oval Green Oval
6 5 Berry Blue Odd_1 Blue Odd_1
7 5 Berry Red Odd_2 Blue, Red Odd_1, Odd_2
8 5 Berry Purple Odd_3 Blue, Red, Purple Odd_1, Odd_2, Odd_3
9 6 Peach Peach Circle Peach Circle
I tried only for Color to begin with, with the hope of also including Shape into the code.
The code I have so far doesn't work but here it is:
Also, not sure if Lag() function would help here?
Data Results_After; Set Results_Before; By Observation; If First.Fruit_ID then Color_Joined = Color;
Else Do; Color_Joined = catx(',', Color);
End; Run;
Proc Print;
Run;
data fruit_info;
length fruit $ 10;
input observation fruit_id fruit $ color $ shape $;
datalines;
1 1 Banana Yellow Odd
2 2 Apple Red Circle_1
3 2 Apple Green Circle_2
4 3 Orange Orange Circle
5 4 Watermelon Green Oval
6 5 Berry Blue Odd_1
7 5 Berry Red Odd_2
8 5 Berry Purple Odd_3
9 6 Peach Peach Circle
;
run;
proc print data=fruit_info noobs;
title 'Input data set';
run;
proc sort data=fruit_info;
by fruit_id;
run;
data fruit_info_retained;
length color_joined shape_joined $ 50;
retain color_joined shape_joined;
set fruit_info;
by fruit_id;
if first.fruit_id then
do;
color_joined = color;
shape_joined = shape;
end;
else
do;
color_joined = catx(', ', color_joined, color);
shape_joined = catx(', ', shape_joined, shape);
end;
run;
proc sort data=fruit_info_retained;
by observation;
run;
proc print data=fruit_info_retained noobs;
title 'Output data set';
var observation fruit_id fruit color shape color_joined shape_joined;
run;
data fruit_info;
length fruit $ 10;
input observation fruit_id fruit $ color $ shape $;
datalines;
1 1 Banana Yellow Odd
2 2 Apple Red Circle_1
3 2 Apple Green Circle_2
4 3 Orange Orange Circle
5 4 Watermelon Green Oval
6 5 Berry Blue Odd_1
7 5 Berry Red Odd_2
8 5 Berry Purple Odd_3
9 6 Peach Peach Circle
;
run;
proc print data=fruit_info noobs;
title 'Input data set';
run;
proc sort data=fruit_info;
by fruit_id;
run;
data fruit_info_retained;
length color_joined shape_joined $ 50;
retain color_joined shape_joined;
set fruit_info;
by fruit_id;
if first.fruit_id then
do;
color_joined = color;
shape_joined = shape;
end;
else
do;
color_joined = catx(', ', color_joined, color);
shape_joined = catx(', ', shape_joined, shape);
end;
run;
proc sort data=fruit_info_retained;
by observation;
run;
proc print data=fruit_info_retained noobs;
title 'Output data set';
var observation fruit_id fruit color shape color_joined shape_joined;
run;
Hi,
I am trying to concatenate all the rows from each group. I have written a code from which I am getting my desire result in last line(using last.) but problem is , it is only keeping up to 32767 characters. I am looking for a solution, if the character limit reaches 32767 then in the next variable (column) the remaining character should add. I am using the below code but it is not spiting into different variable if limit reaches. I have attached the result. Any help will be appreciated. Many thanks in advance.
data part4 (keep=DOC_NUMBER original_variable count);
set part3;
BY DOC_NUMBER;
if FIRST.DOC_NUMBER then
Count = 0;
Count + 1;
run;
data part5;
length concatenated_field $ 32767;
retain concatenated_field;
set part4;
by DOC_NUMBER;
if first.DOC_NUMBER then
do;
concatenated_field = original_variable;
end;
else
do;
concatenated_field = catx(', ', concatenated_field, original_variable);
end;
run;
Just add a retain-statement after by.
retain Color_Joined Shape_Joined;
Hello
Should retain statement be after by or before????
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.