Solved: How can I output to two different datasets and count separately in eac...

JMagenta · Posted 02-08-2023 06:26 PM

Hello and thank you in advance,

I want to output to two different datasets and use a counter variable in each data set that counts separately. Like so:

data have; infile datalines dlm=','; input blah $ var1 var2 ; datalines; a, 1, 2, b, 3, 2, c, 3, 3, d, 8, 4, e, 5, 6, f, 7, 2, g, 3, 1, h, 9, 4, i, 8, 9 ; run;

data want1 want2; set have; if blah in ('a', 'b', 'c', 'd') then do;
do i=1 to (total number of records but I don't know how to specify that);
recordno=i;
output want1;
end;
end;
else do;
do i=1 to total obs in that dataset;
recordno=i;
output want2;
end;
end;
run;

With the result that looks something like this:

Want 1 :

blah var1 var2 recordno

a 1 2 1

b 3 2 2

c 3 3 3

d 8 4 4

Want 2:

blah var1 var2 recordno

e 5 6 1

f 7 2 2

g 3 1 3

h 9 4 4

and so on.

How can I do this ?

I know there must be use of a do loop. I'm just not figuring out how to do so.

Thank you

ballardw · Posted 02-09-2023 01:29 AM

@JMagenta wrote:

So I don't want to group the datasets by 4.

You want to be VERY careful about building your example data and cases to not imply rules. You had 9 input records in your original question, showed two sets with 4 records IN ORDER of appearance as the output data and did not display the 9th. So where did that ninth go? The implication without explicit rules was "select 4 records in order". You did not explain any rule for which records were to be selected to which data set, just a "sort of example".

So, examine the Have set below. Note that the blah in ('a' 'b' 'c') are not in sequential order.

Then parse the second set for splitting.

data have; 
   infile datalines dlm=','; 
   input blah $ var1 var2 ;
datalines;
a, 1, 2,
e, 3, 2, 
c, 3, 3, 
d, 8, 4,  
b, 5, 6,  
f, 7, 2, 
a, 3, 1,  
h, 9, 4,  
c, 8, 9 
; 
run;

data want1 (rename=(rec1=recordno) drop=rec2)
     want2 (rename=(rec2=recordno) drop=rec1)
;
   retain rec1 rec2 0;

   set have;
   if blah in ('a' 'b' 'c') then do;
      rec1+1;
      output want1;
   end;
   else do;
      rec2+1;
      output want2;
   end;
run;

Hint: If you have TWO sequences then you want TWO variables to hold the count values. Conditionally increment the counter based on the stated selection rule(s).

Rename to get the one variable name wanted and drop the other variable as output data set options.

View solution in original post

Reeza · Posted 02-08-2023 06:39 PM

Not sure a do loop is the correct approach here. That would replicate rows from your original data set. Please show what the original data set would look like and how you would decide which rows from the original go to the new data sets. If replicating, how do you decide how many times to replicate the row?

ballardw · Posted 02-08-2023 07:01 PM

Besides a Have data step that won't run as posted my concern is this
do i=1 to (total number of records but I don't know how to specify that);

If you don't know how many records or provide a rule for us, how can we know what to do?

You should provide much more detail about what "and so on" actually means.

You might also tell us what the resulting multiple data sets were to be used for. It may be, quite often, that splitting a data set is not needed.

A data step that will run that creates 9 records:

data have; 
   infile datalines dlm=','; 
   input blah $ var1 var2 ; 
datalines;
a, 1, 2,
b, 3, 2, 
c, 3, 3, 
d, 8, 4,  
e, 5, 6,  
f, 7, 2, 
g, 3, 1,  
h, 9, 4,  
i, 8, 9 
; 
run;

Here is a way to add a group variable that takes on the 1, 2, 3 etc in groups of 4 and adds a recordno variable per group.

data have; 
   infile datalines dlm=','; 
   input blah $ var1 var2 ;
   retain group 0;
   recordno= mod(_n_,4); 
   if recordno=1 then group+1;
   if recordno=0 then recordno=4;
datalines;
a, 1, 2,
b, 3, 2, 
c, 3, 3, 
d, 8, 4,  
e, 5, 6,  
f, 7, 2, 
g, 3, 1,  
h, 9, 4,  
i, 8, 9 
; 
run;

The data step DATA statement would require having all of the data sets Want1 to WantXX declared. If you do not know how many there should be then you cant write the data step to split the data at all.

The above code would allow BY Group processing using the GROUP variable to do something with each set of 4 records.

JMagenta · Posted 02-08-2023 08:21 PM

So I don't want to group the datasets by 4. I just want to output each dataset and have a variable that numbers each observation in the dataset. Or counts the number of rows in the dataset. Just as long as it counts each dataset separately. In the code below I have to keep the variable "recordno" in the first column so I'm already using the retain function out of the gate.

For this code:

data have; 
   infile datalines dlm=','; 
   input blah $ var1 var2 ;
datalines;
a, 1, 2,
b, 3, 2, 
c, 3, 3, 
d, 8, 4,  
e, 5, 6,  
f, 7, 2, 
g, 3, 1,  
h, 9, 4,  
i, 8, 9 
; 
run;


Data want1 want2;
retain recordno;
set have;
If blah in ('a', 'b', 'c') then output want1;
else output want2;
   recordno=_n_;
run;

I get the following for WANT 1:

I get the following for WANT2:

What I would like to get for WANT 1 is continuous counting starting with 1, for each observation. I would like the same for WANT 2. See the following.

Want1:

Want 2:

Hopefully this is clearer and that you can help.

Thank you.

ballardw · Posted 02-09-2023 01:29 AM

@JMagenta wrote:

So I don't want to group the datasets by 4.

You want to be VERY careful about building your example data and cases to not imply rules. You had 9 input records in your original question, showed two sets with 4 records IN ORDER of appearance as the output data and did not display the 9th. So where did that ninth go? The implication without explicit rules was "select 4 records in order". You did not explain any rule for which records were to be selected to which data set, just a "sort of example".

So, examine the Have set below. Note that the blah in ('a' 'b' 'c') are not in sequential order.

Then parse the second set for splitting.

data have; 
   infile datalines dlm=','; 
   input blah $ var1 var2 ;
datalines;
a, 1, 2,
e, 3, 2, 
c, 3, 3, 
d, 8, 4,  
b, 5, 6,  
f, 7, 2, 
a, 3, 1,  
h, 9, 4,  
c, 8, 9 
; 
run;

data want1 (rename=(rec1=recordno) drop=rec2)
     want2 (rename=(rec2=recordno) drop=rec1)
;
   retain rec1 rec2 0;

   set have;
   if blah in ('a' 'b' 'c') then do;
      rec1+1;
      output want1;
   end;
   else do;
      rec2+1;
      output want2;
   end;
run;

Hint: If you have TWO sequences then you want TWO variables to hold the count values. Conditionally increment the counter based on the stated selection rule(s).

Rename to get the one variable name wanted and drop the other variable as output data set options.

JMagenta · Posted 02-09-2023 12:41 PM

Thank you! I had to make it work for a much larger dataset but it worked.

How can I output to two different datasets and count separately in each?

Re: How can I output to two different datasets and count separately in each?

Re: How can I output to two different datasets and count separately in each?

Re: How can I output to two different datasets and count separately in each?

Re: How can I output to two different datasets and count separately in each?

Re: How can I output to two different datasets and count separately in each?

Re: How can I output to two different datasets and count separately in each?