BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
JMagenta
Obsidian | Level 7

Hello and thank you in advance,

 

I want to output to two different datasets and use a counter variable in each data set that counts separately. Like so:

 

data have; infile datalines dlm=','; input blah $ var1 var2 ; datalines; a, 1, 2,  b, 3, 2,  c, 3, 3, d, 8, 4,  e, 5, 6,  f, 7, 2, g, 3, 1,  h, 9, 4,  i, 8, 9 ; run;

 

data want1 want2; set have; if blah in ('a', 'b', 'c', 'd') then do;
do i=1 to (total number of records but I don't know how to specify that);
recordno=i;
output want1;
end;
end;

else do;
do i=1 to total obs in that dataset;
recordno=i;
output want2;

end;
end;
run;

 

With the result that looks something like this:

Want 1 :

blah var1 var2 recordno

a      1        2        1

b      3        2         2

c       3        3         3

d      8       4          4

 

Want 2:

blah var1 var2 recordno

e      5       6       1

f     7        2         2

g     3        1       3

h      9      4          4

 

and so on.

How can I do this ?

I know there must be use of a do loop. I'm just not figuring out how to do so.

 

Thank you

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

@JMagenta wrote:

So I don't want to group the datasets by 4.

 


You want to be VERY careful about building your example data and cases to not imply rules. You had 9 input records in your original question, showed two sets with 4 records IN ORDER of appearance as the output data and did not display the 9th. So where did that ninth go? The implication without explicit rules was "select 4 records in order". You did not explain any rule for which records were to be selected to which data set, just a "sort of example".

 

So, examine the Have set below. Note that the blah in ('a' 'b' 'c') are not in sequential order.

Then parse the second set for splitting.

data have; 
   infile datalines dlm=','; 
   input blah $ var1 var2 ;
datalines;
a, 1, 2,
e, 3, 2, 
c, 3, 3, 
d, 8, 4,  
b, 5, 6,  
f, 7, 2, 
a, 3, 1,  
h, 9, 4,  
c, 8, 9 
; 
run;

data want1 (rename=(rec1=recordno) drop=rec2)
     want2 (rename=(rec2=recordno) drop=rec1)
;
   retain rec1 rec2 0;

   set have;
   if blah in ('a' 'b' 'c') then do;
      rec1+1;
      output want1;
   end;
   else do;
      rec2+1;
      output want2;
   end;
run;

Hint: If you have TWO sequences then you want TWO variables to hold the count values. Conditionally increment the counter based on the stated selection rule(s).

Rename to get the one variable name wanted and drop the other variable as output data set options.

 

View solution in original post

5 REPLIES 5
Reeza
Super User
Not sure a do loop is the correct approach here. That would replicate rows from your original data set. Please show what the original data set would look like and how you would decide which rows from the original go to the new data sets. If replicating, how do you decide how many times to replicate the row?
ballardw
Super User

Besides a Have data step that won't run as posted my concern is this
do i=1 to (total number of records but I don't know how to specify that);

If you don't know how many records or provide a rule for us, how can we know what to do?

You should provide much more detail about what "and so on" actually means.

 

You might also tell us what the resulting multiple data sets were to be used for. It may be, quite often, that splitting a data set is not needed.

 

A data step that will run that creates 9 records:

data have; 
   infile datalines dlm=','; 
   input blah $ var1 var2 ; 
datalines;
a, 1, 2,
b, 3, 2, 
c, 3, 3, 
d, 8, 4,  
e, 5, 6,  
f, 7, 2, 
g, 3, 1,  
h, 9, 4,  
i, 8, 9 
; 
run;

Here is a way to add a group variable that takes on the 1, 2, 3 etc in groups of 4 and adds a recordno variable per group.

data have; 
   infile datalines dlm=','; 
   input blah $ var1 var2 ;
   retain group 0;
   recordno= mod(_n_,4); 
   if recordno=1 then group+1;
   if recordno=0 then recordno=4;
datalines;
a, 1, 2,
b, 3, 2, 
c, 3, 3, 
d, 8, 4,  
e, 5, 6,  
f, 7, 2, 
g, 3, 1,  
h, 9, 4,  
i, 8, 9 
; 
run;

The data step DATA statement would require having all of the data sets Want1 to WantXX declared. If you do not know how many there should be then you cant write the data step to split the data at all.

The above code would allow BY Group processing using the GROUP variable to do something with each set of 4 records.

 

 

 

JMagenta
Obsidian | Level 7

So I don't want to group the datasets by 4. I just want to output each dataset and have a variable that numbers each observation in the dataset. Or counts the number of rows in the dataset. Just as long as it counts each dataset separately. In the code below I have to keep the variable "recordno" in the first column so I'm already using the retain function out of the gate. 

 

For this code:

data have; 
   infile datalines dlm=','; 
   input blah $ var1 var2 ;
datalines;
a, 1, 2,
b, 3, 2, 
c, 3, 3, 
d, 8, 4,  
e, 5, 6,  
f, 7, 2, 
g, 3, 1,  
h, 9, 4,  
i, 8, 9 
; 
run;


Data want1 want2;
retain recordno;
set have;
If blah in ('a', 'b', 'c') then output want1;
else output want2;
   recordno=_n_;
run;

I get the following for WANT 1:

JMagenta_0-1675904763819.png

I get the following for WANT2:

JMagenta_1-1675904873542.png

What I would like to get for WANT 1 is continuous counting starting with 1, for each observation. I would like the same for WANT 2.  See the following.

 

Want1:

JMagenta_2-1675905212922.png

Want 2:

JMagenta_3-1675905600058.png

 

 

Hopefully this is clearer and that you can help.

 

Thank you.

 

ballardw
Super User

@JMagenta wrote:

So I don't want to group the datasets by 4.

 


You want to be VERY careful about building your example data and cases to not imply rules. You had 9 input records in your original question, showed two sets with 4 records IN ORDER of appearance as the output data and did not display the 9th. So where did that ninth go? The implication without explicit rules was "select 4 records in order". You did not explain any rule for which records were to be selected to which data set, just a "sort of example".

 

So, examine the Have set below. Note that the blah in ('a' 'b' 'c') are not in sequential order.

Then parse the second set for splitting.

data have; 
   infile datalines dlm=','; 
   input blah $ var1 var2 ;
datalines;
a, 1, 2,
e, 3, 2, 
c, 3, 3, 
d, 8, 4,  
b, 5, 6,  
f, 7, 2, 
a, 3, 1,  
h, 9, 4,  
c, 8, 9 
; 
run;

data want1 (rename=(rec1=recordno) drop=rec2)
     want2 (rename=(rec2=recordno) drop=rec1)
;
   retain rec1 rec2 0;

   set have;
   if blah in ('a' 'b' 'c') then do;
      rec1+1;
      output want1;
   end;
   else do;
      rec2+1;
      output want2;
   end;
run;

Hint: If you have TWO sequences then you want TWO variables to hold the count values. Conditionally increment the counter based on the stated selection rule(s).

Rename to get the one variable name wanted and drop the other variable as output data set options.

 

JMagenta
Obsidian | Level 7

Thank you! I had to make it work for a much larger dataset but it worked. 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 574 views
  • 2 likes
  • 3 in conversation