BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Nipun22
Obsidian | Level 7

1. Concatenate sequence letters

data catletters;
input first second $3.;
cards;
1 A
1 B
1 C
1 D
2 E
2 F
3 S
3 A
4 C
5 Y
6 II
6 UU
6 OO
6 N
7 G
7 H
run;

I want an output dataset like this:

1 "A,B,C,D"
2 "E,F"
3 "S,A"
4 "C"
5 "Y"
6 "II,UU,OO,N"
7 "G,H"

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

@Nipun22 wrote:
can you explain execution line by line of your code please?
data want;

Start a data step to create the dataset named WANT.

do until (last.first);

Start a DO loop that will continue until the last observation in the group of observations that have the same value of the variable FIRST (and also same value of any other BY variables that precede FIRST in the BY statement).

set catletters;

Read in the next observation from the dataset CATLETTERS.

by first;

Process the observations in CATLETTERS by the value of the variable FIRST.  This will make the data step check that the values of FIRST are non-decreasing and also set flag variable FIRST.FIRST and LAST.FIRST to indicate if the current observation is the first/last in the group of observations with this value of FIRST.

length new_var $30 ;

Define a new character variable named NEW_VAR with space to store 30 bytes.

new_var=catx(',',new_var,second);

Assign to NEW_VAR the current value of NEW_VAR concatenated with the current value of SECOND.  When both are not empty then insert a comma between the two values.

end;

End the DO loop.  So if there are more observations for this value of FIRST then the loop will execute again and those values will be appended to NEW_VAR.  Otherwise the data step continues, eventually reaching the end of the step where the implied OUTPUT statement will write one observation for this BY group.

drop second;

Do not include the variable SECOND in the new WANT dataset, since it would not make any sense.

run;

End the data step definition, so it can start running.

View solution in original post

9 REPLIES 9
SASJedi
SAS Super FREQ
data want (drop=second);
	set catletters;
	length ConcatText $50;
	retain ConcatText ;
	by first;
	if first.first then call missing(concatText);
	concatText=catx(',',concatText,second);
	if last.first then output;
run;
Check out my Jedi SAS Tricks for SAS Users
A_Kh
Lapis Lazuli | Level 10

First transpose data by id variable, then concatenate transposed variables into a singe variable.

proc transpose data=catletters out=have;
	by first;
	var second;
run; 
data want;
	set have; 
	second=catx(',', of col:); 
	drop _: col:; 
run; 
Tom
Super User Tom
Super User

Not hard.  Assuming the data is sort by FIRST and you know how long the new variable needs to be.  

data want;
  do until (last.first);
    set catletters;
    by first;
    length new_var $30 ;
    new_var=catx(',',new_var,second);
  end;
  drop second;
run;

If you really want the quotes as part of the value then add this statement after the END statement.

new_var=quote(trim(new_var));

 

Nipun22
Obsidian | Level 7
can you explain execution line by line of your code please?
Tom
Super User Tom
Super User

@Nipun22 wrote:
can you explain execution line by line of your code please?
data want;

Start a data step to create the dataset named WANT.

do until (last.first);

Start a DO loop that will continue until the last observation in the group of observations that have the same value of the variable FIRST (and also same value of any other BY variables that precede FIRST in the BY statement).

set catletters;

Read in the next observation from the dataset CATLETTERS.

by first;

Process the observations in CATLETTERS by the value of the variable FIRST.  This will make the data step check that the values of FIRST are non-decreasing and also set flag variable FIRST.FIRST and LAST.FIRST to indicate if the current observation is the first/last in the group of observations with this value of FIRST.

length new_var $30 ;

Define a new character variable named NEW_VAR with space to store 30 bytes.

new_var=catx(',',new_var,second);

Assign to NEW_VAR the current value of NEW_VAR concatenated with the current value of SECOND.  When both are not empty then insert a comma between the two values.

end;

End the DO loop.  So if there are more observations for this value of FIRST then the loop will execute again and those values will be appended to NEW_VAR.  Otherwise the data step continues, eventually reaching the end of the step where the implied OUTPUT statement will write one observation for this BY group.

drop second;

Do not include the variable SECOND in the new WANT dataset, since it would not make any sense.

run;

End the data step definition, so it can start running.

Nipun22
Obsidian | Level 7
Why we have declared set statement after the loop?

data want;
do until (last.first);
set catletters;

While we usually declare it just after the data statement like this?

data want;
set catletters;
do until (last.first);
Kurt_Bremser
Super User

SET has two functions: at DATA step compile time, the dataset metadata is read and used to build the PDV; at execution time, it does the actual read of the next observation.

A SET in a loop means that, in a single DATA step iteration, an observation is read for every iteration of the DO.

Tom
Super User Tom
Super User

So that one iteration of the data step reads in ALL of the observations for that BY group (instead of just one observation.)

 

In this case that makes the code much simpler.  There is no need to RETAIN the new variable to be able to append the values from multiple observations.  And since the new variable is not retained there is no need to clear it when starting a new BY group.  And there is no need to have a conditional output statement.

 

Compare the two versions:

data want;
  do until (last.first);
    set catletters;
    by first;
    length new_var $30 ;
    new_var=catx(',',new_var,second);
  end;
  drop second;
run;

data want;
  set catletters;
  by first;
  length new_var $30 ;
  retain new_var ;
  if first.first then call missing(new_var);
  new_var=catx(',',new_var,second);
  if last.first then output;
  drop second;
run;

 

 

AhmedAl_Attar
Ammonite | Level 13

@Nipun22 

if you want to understand the capabilities of the DOW Loop @Tom used, check out these additional papers after reading his responce

 

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 870 views
  • 5 likes
  • 6 in conversation