DATA Step, Macro, Functions and more

why doesn't DATA output statement duplicate all observations?

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 17
Accepted Solution

why doesn't DATA output statement duplicate all observations?

Hi, i'm running a program and with data, as below:

 

data two (xy.txt):

5 2
3 1
5 6

the program:

 

data two;
	infile '/folders/myfolders/sasuser.v94/xy.txt';
	input x y;
run;

data one two other;
  set two;
  if x = 5 then output one;
  if y < 5 then output two;
  output;
run;

 

since output statement is placed at the end of the program, without any conditions (if),

shouldn't the dataset two have 2 exact same observations?

 

but what I see, from the output windows of SAS is:

- a table with 5 observations.

 

Please help.

 

below, I copied and pasted from the output window.

 

 Work.two Total rows: 5Total columns: 2

152 
252 
331 
431 
556

 


Accepted Solutions
Solution
4 weeks ago
SAS Super FREQ
Posts: 9,431

Re: why doesn't DATA output statement duplicate all observations?

[ Edited ]

Hi:
The issue you're going to run into is that the last OUTPUT statement will also write observations to one and two. So you'll end up with MORE observations in ONE and TWO than you might intend. It depends on what your intention is. Consider this output and debugging version of the program shown below. In the second output and the code, the HOW_OUT variable shows you exactly HOW each observation was written to each output file.
Cynthia

 

When I run a version of your program (to eliminate the confusion of having data two and set two, I started with a data set called 'fakedata'), this is what I get:

use_IF_out.png

 

This is how each obs got into the output files -- notice the new variable called "HOW_OUT" which shows exactly which statement wrote the obs to the file:

how_out.png

 

using this code

data fakedata;
input x y;
datalines;
5 2
3 1
5 6
;
run;

data one two other;
  length x 8 y 8 how_out $14;
  set fakedata;
  if x = 5 then do; how_out='if x = 5'; output one; end;
  if y < 5 then do; how_out='if y < 5'; output two; end;
  how_out='final output';
  output;
run;

proc print data=fakedata noobs;
  title '0) starting with work.fakedata';
  run;
 
proc print data=one noobs;
  title '1) what is in work.one';
  run;
 
proc print data=two noobs;
  title '2) what is in work.two';
  run;

proc print data=other noobs;
  title '3) what is in work.other';
  run;

View solution in original post


All Replies
PROC Star
Posts: 1,400

Re: why doesn't DATA output statement duplicate all observations?

Posted in reply to jimmychoi

Yes, the data set two should have two identical observations. And it does.

 

I suspect you mistake the observation number for an actual variable? See the code below

 

data two;
input x y;
datalines;
5 2
3 1
5 6
;

data one two other;
  set two;
  if x = 5 then output one;
  if y < 5 then output two;
  output;
run;

proc print data=two;
run;
Solution
4 weeks ago
SAS Super FREQ
Posts: 9,431

Re: why doesn't DATA output statement duplicate all observations?

[ Edited ]

Hi:
The issue you're going to run into is that the last OUTPUT statement will also write observations to one and two. So you'll end up with MORE observations in ONE and TWO than you might intend. It depends on what your intention is. Consider this output and debugging version of the program shown below. In the second output and the code, the HOW_OUT variable shows you exactly HOW each observation was written to each output file.
Cynthia

 

When I run a version of your program (to eliminate the confusion of having data two and set two, I started with a data set called 'fakedata'), this is what I get:

use_IF_out.png

 

This is how each obs got into the output files -- notice the new variable called "HOW_OUT" which shows exactly which statement wrote the obs to the file:

how_out.png

 

using this code

data fakedata;
input x y;
datalines;
5 2
3 1
5 6
;
run;

data one two other;
  length x 8 y 8 how_out $14;
  set fakedata;
  if x = 5 then do; how_out='if x = 5'; output one; end;
  if y < 5 then do; how_out='if y < 5'; output two; end;
  how_out='final output';
  output;
run;

proc print data=fakedata noobs;
  title '0) starting with work.fakedata';
  run;
 
proc print data=one noobs;
  title '1) what is in work.one';
  run;
 
proc print data=two noobs;
  title '2) what is in work.two';
  run;

proc print data=other noobs;
  title '3) what is in work.other';
  run;

Occasional Contributor
Posts: 17

Re: why doesn't DATA output statement duplicate all observations?

Posted in reply to Cynthia_sas
Your idea to name data two to fakedata really helped me to understand, thanks
SAS Super FREQ
Posts: 9,431

Re: why doesn't DATA output statement duplicate all observations?

[ Edited ]
Posted in reply to jimmychoi

Hi,
That really was something I consider a best practice. In my world, it is not good to do this:
data mydata;
   set mydata;
... more code ...;
run;

Because that makes it impossible to separate the INPUT data (on the SET statement) from the OUTPUT data (on the DATA statement) and could result in the loss of the INPUT data if you have any fatal errors in your code.

I ALWAYS recommend to my students that they avoid the temptation to use the same name on both their DATA and SET statements.

Cynthia

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 4 replies
  • 118 views
  • 4 likes
  • 3 in conversation