Solved: Re: I want an output dataset like this:

Nipun22 · Posted 07-02-2024 11:35 AM

1. Concatenate sequence letters

data catletters;
input first second $3.;
cards;
1 A
1 B
1 C
1 D
2 E
2 F
3 S
3 A
4 C
5 Y
6 II
6 UU
6 OO
6 N
7 G
7 H
run;

I want an output dataset like this:

1 "A,B,C,D"
2 "E,F"
3 "S,A"
4 "C"
5 "Y"
6 "II,UU,OO,N"
7 "G,H"

Tom · Posted 07-02-2024 03:21 PM

@Nipun22 wrote:
can you explain execution line by line of your code please?

data want;

Start a data step to create the dataset named WANT.

do until (last.first);

Start a DO loop that will continue until the last observation in the group of observations that have the same value of the variable FIRST (and also same value of any other BY variables that precede FIRST in the BY statement).

set catletters;

Read in the next observation from the dataset CATLETTERS.

by first;

Process the observations in CATLETTERS by the value of the variable FIRST. This will make the data step check that the values of FIRST are non-decreasing and also set flag variable FIRST.FIRST and LAST.FIRST to indicate if the current observation is the first/last in the group of observations with this value of FIRST.

length new_var $30 ;

Define a new character variable named NEW_VAR with space to store 30 bytes.

new_var=catx(',',new_var,second);

Assign to NEW_VAR the current value of NEW_VAR concatenated with the current value of SECOND. When both are not empty then insert a comma between the two values.

end;

End the DO loop. So if there are more observations for this value of FIRST then the loop will execute again and those values will be appended to NEW_VAR. Otherwise the data step continues, eventually reaching the end of the step where the implied OUTPUT statement will write one observation for this BY group.

drop second;

Do not include the variable SECOND in the new WANT dataset, since it would not make any sense.

run;

End the data step definition, so it can start running.

View solution in original post

SASJedi · Posted 07-02-2024 11:41 AM

data want (drop=second);
	set catletters;
	length ConcatText $50;
	retain ConcatText ;
	by first;
	if first.first then call missing(concatText);
	concatText=catx(',',concatText,second);
	if last.first then output;
run;

Check out my Jedi SAS Tricks for SAS Users

A_Kh · Posted 07-02-2024 11:46 AM

First transpose data by id variable, then concatenate transposed variables into a singe variable.

proc transpose data=catletters out=have;
	by first;
	var second;
run; 
data want;
	set have; 
	second=catx(',', of col:); 
	drop _: col:; 
run;

Tom · Posted 07-02-2024 02:58 PM

Not hard. Assuming the data is sort by FIRST and you know how long the new variable needs to be.

data want;
  do until (last.first);
    set catletters;
    by first;
    length new_var $30 ;
    new_var=catx(',',new_var,second);
  end;
  drop second;
run;

If you really want the quotes as part of the value then add this statement after the END statement.

new_var=quote(trim(new_var));

Nipun22 · Posted 07-02-2024 03:02 PM

can you explain execution line by line of your code please?

Tom · Posted 07-02-2024 03:21 PM

@Nipun22 wrote:
can you explain execution line by line of your code please?

data want;

Start a data step to create the dataset named WANT.

do until (last.first);

Start a DO loop that will continue until the last observation in the group of observations that have the same value of the variable FIRST (and also same value of any other BY variables that precede FIRST in the BY statement).

set catletters;

Read in the next observation from the dataset CATLETTERS.

by first;

Process the observations in CATLETTERS by the value of the variable FIRST. This will make the data step check that the values of FIRST are non-decreasing and also set flag variable FIRST.FIRST and LAST.FIRST to indicate if the current observation is the first/last in the group of observations with this value of FIRST.

length new_var $30 ;

Define a new character variable named NEW_VAR with space to store 30 bytes.

new_var=catx(',',new_var,second);

Assign to NEW_VAR the current value of NEW_VAR concatenated with the current value of SECOND. When both are not empty then insert a comma between the two values.

end;

End the DO loop. So if there are more observations for this value of FIRST then the loop will execute again and those values will be appended to NEW_VAR. Otherwise the data step continues, eventually reaching the end of the step where the implied OUTPUT statement will write one observation for this BY group.

drop second;

Do not include the variable SECOND in the new WANT dataset, since it would not make any sense.

run;

End the data step definition, so it can start running.

Nipun22 · Posted 07-05-2024 08:28 AM

Why we have declared set statement after the loop?

data want;
do until (last.first);
set catletters;

While we usually declare it just after the data statement like this?

data want;
set catletters;
do until (last.first);

Kurt_Bremser · Posted 07-05-2024 09:19 AM

SET has two functions: at DATA step compile time, the dataset metadata is read and used to build the PDV; at execution time, it does the actual read of the next observation.

A SET in a loop means that, in a single DATA step iteration, an observation is read for every iteration of the DO.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Tom · Posted 07-05-2024 12:43 PM

So that one iteration of the data step reads in ALL of the observations for that BY group (instead of just one observation.)

In this case that makes the code much simpler. There is no need to RETAIN the new variable to be able to append the values from multiple observations. And since the new variable is not retained there is no need to clear it when starting a new BY group. And there is no need to have a conditional output statement.

Compare the two versions:

data want;
  do until (last.first);
    set catletters;
    by first;
    length new_var $30 ;
    new_var=catx(',',new_var,second);
  end;
  drop second;
run;

data want;
  set catletters;
  by first;
  length new_var $30 ;
  retain new_var ;
  if first.first then call missing(new_var);
  new_var=catx(',',new_var,second);
  if last.first then output;
  drop second;
run;

AhmedAl_Attar · Posted 07-06-2024 10:24 PM

@Nipun22

if you want to understand the capabilities of the DOW Loop @Tom used, check out these additional papers after reading his responce

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away