Good morning.
I am attempting to concatenate rows of data from multiple line onto one line. I can't see to get this work. I've deconstructed it to see if I can tell what's going on but I can't see to locate the error. What I end up with is a single line per ID_NUMBER with only the first comment. I don't get the subsequent concatenations for the same ID_NUMBER.
I'm sure there must be a better way to do this but I'm trying to understand how such a data step works.
HAVE | |||
ID_NUMBER | COMMENT_NUMBER | COMMENT_SEQUENCE_NUMBER | INCOMING_COMMENT |
1 | 10 | 1 | This person has four lines |
1 | 10 | 2 | on their comment. I would like all four |
1 | 10 | 3 | on the same line. |
1 | 10 | 4 | This is the final line for this comment. |
1 | 15 | 1 | Jack and Jill went up the |
1 | 15 | 2 | hill to fetch a pail |
1 | 15 | 3 | of water. |
2 | 16 | 1 | I wish the sky was purple |
2 | 17 | 1 | If the sky was purple can you |
2 | 17 | 2 | imagine the sunsets? |
3 | 22 | 1 | Sand paintings are the best. |
3 | 22 | 2 | You can always get a different |
3 | 22 | 3 | view by changing the orientation. |
3 | 22 | 4 | Just tip the picture and BAM! |
3 | 22 | 5 | You get a brand new landscape |
WANT | |||
ID_NUMBER | COMMENT_NUMBER | INCOMING_COMMENT | |
1 | 10 | This person has four lines on their comment. I would like all four on the same line. This is the final line for this comment. | |
1 | 15 | Jack and Jill went up the hill to fetch a pail of water of water. | |
2 | 16 | I wish the sky was purple | |
2 | 17 | If the sky was purple can you imagine the sunsets? | |
2 | 22 | Sand paintings are the best. You can always get a different view by changing the orientation. Just tip the picture and BAM! You get a brand new landscape |
Thanks very much if someone has the time to help out.
data want (keep = id_number w_comment);
set have (rename=(id_number = in_id_number comment_sequence_number = in_comment_sequence_number comment_line = in_comment))
end = eof_input;
length w_comment $ 1000;
retain id_number comment_sequence_number comment_line w_comment;
/*Initialize the output (retained) columns and return to top of data step.*/
if _n_ = 1 then do;
id_number = in_id_number;
comment_sequence_number = in_comment_sequence_number;
comment_line = in_comment;
w_comment = in_comment;
return;
end;
if id_number = in_id_number and comment_sequence_number = in_comment_sequence_number then do;
w_comment = catx(", ", w_comment, in_comment);
end;
if in_id_number > id_number then do;
output;
id_number = in_id_number;
comment_sequence_number = in_comment_sequence_number;
comment_line = in_comment;
w_comment = in_comment;
return;
end;
/*At end of file on input file write out the last retained record of data.*/
if eof_input then output;
run;
There's two common methods of doing this, one is using a data step and the second is to transpose.
You have two BY variables, so just make sure to have both in your BY statement.
*create sample data for demonstration;
data have;
infile cards dlm='09'x;
input OrgID Product $ States $;
cards;
1 football DC
1 football VA
1 football MD
2 football CA
3 football NV
3 football CA
;
run;
*Sort - required for both options;
proc sort data=have;
by orgID;
run;
**********************************************************************;
*Use RETAIN and BY group processing to combine the information;
**********************************************************************;
data want_option1;
set have;
by orgID;
length combined $100.;
retain combined;
if first.orgID then
combined=states;
else
combined=catx(', ', combined, states);
if last.orgID then
output;
run;
**********************************************************************;
*Transpose it to a wide format and then combine into a single field;
**********************************************************************;
proc transpose data=have out=wide prefix=state_;
by orgID;
var states;
run;
data want_option2;
set wide;
length combined $100.;
combined=catx(', ', of state_:);
run;
like this?
data HAVE;
infile cards4 dlm="|";
input ID_NUMBER COMMENT_NUMBER COMMENT_SEQUENCE_NUMBER INCOMING_COMMENT : $ 100.;
cards4;
1|10|1|This person has four lines
1|10|2|on their comment. I would like all four
1|10|3|on the same line.
1|10|4|This is the final line for this comment.
1|15|1|Jack and Jill went up the
1|15|2|hill to fetch a pail
1|15|3|of water.
2|16|1|I wish the sky was purple
2|17|1|If the sky was purple can you
2|17|2|imagine the sunsets?
3|22|1|Sand paintings are the best.
3|22|2|You can always get a different
3|22|3|view by changing the orientation.
3|22|4|Just tip the picture and BAM!
3|22|5|You get a brand new landscape
;;;;
run;
proc print;
run;
data want(rename=(IC = INCOMING_COMMENT));
set HAVE;
by ID_NUMBER COMMENT_NUMBER;
retain IC;
length IC $ 32767;
if first.COMMENT_NUMBER then IC = " ";
ic = catx(" ", ic, INCOMING_COMMENT);
if last.COMMENT_NUMBER then output;
drop INCOMING_COMMENT COMMENT_SEQUENCE_NUMBER;
run;
proc print;
run;
Bart
There's two common methods of doing this, one is using a data step and the second is to transpose.
You have two BY variables, so just make sure to have both in your BY statement.
*create sample data for demonstration;
data have;
infile cards dlm='09'x;
input OrgID Product $ States $;
cards;
1 football DC
1 football VA
1 football MD
2 football CA
3 football NV
3 football CA
;
run;
*Sort - required for both options;
proc sort data=have;
by orgID;
run;
**********************************************************************;
*Use RETAIN and BY group processing to combine the information;
**********************************************************************;
data want_option1;
set have;
by orgID;
length combined $100.;
retain combined;
if first.orgID then
combined=states;
else
combined=catx(', ', combined, states);
if last.orgID then
output;
run;
**********************************************************************;
*Transpose it to a wide format and then combine into a single field;
**********************************************************************;
proc transpose data=have out=wide prefix=state_;
by orgID;
var states;
run;
data want_option2;
set wide;
length combined $100.;
combined=catx(', ', of state_:);
run;
Thank you Reeza.
I never considered using proc transpose in that way. Very simple and efficient suggestion. Thank you.
Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.
If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.