Hi, i'm running a program and with data, as below:
data two (xy.txt):
5 2
3 1
5 6
the program:
data two;
infile '/folders/myfolders/sasuser.v94/xy.txt';
input x y;
run;
data one two other;
set two;
if x = 5 then output one;
if y < 5 then output two;
output;
run;
since output statement is placed at the end of the program, without any conditions (if),
shouldn't the dataset two have 2 exact same observations?
but what I see, from the output windows of SAS is:
- a table with 5 observations.
Please help.
below, I copied and pasted from the output window.
Work.two Total rows: 5Total columns: 2
1 | 5 | 2 |
2 | 5 | 2 |
3 | 3 | 1 |
4 | 3 | 1 |
5 | 5 | 6 |
Hi:
The issue you're going to run into is that the last OUTPUT statement will also write observations to one and two. So you'll end up with MORE observations in ONE and TWO than you might intend. It depends on what your intention is. Consider this output and debugging version of the program shown below. In the second output and the code, the HOW_OUT variable shows you exactly HOW each observation was written to each output file.
Cynthia
When I run a version of your program (to eliminate the confusion of having data two and set two, I started with a data set called 'fakedata'), this is what I get:
This is how each obs got into the output files -- notice the new variable called "HOW_OUT" which shows exactly which statement wrote the obs to the file:
using this code
data fakedata;
input x y;
datalines;
5 2
3 1
5 6
;
run;
data one two other;
length x 8 y 8 how_out $14;
set fakedata;
if x = 5 then do; how_out='if x = 5'; output one; end;
if y < 5 then do; how_out='if y < 5'; output two; end;
how_out='final output';
output;
run;
proc print data=fakedata noobs;
title '0) starting with work.fakedata';
run;
proc print data=one noobs;
title '1) what is in work.one';
run;
proc print data=two noobs;
title '2) what is in work.two';
run;
proc print data=other noobs;
title '3) what is in work.other';
run;
Yes, the data set two should have two identical observations. And it does.
I suspect you mistake the observation number for an actual variable? See the code below
data two;
input x y;
datalines;
5 2
3 1
5 6
;
data one two other;
set two;
if x = 5 then output one;
if y < 5 then output two;
output;
run;
proc print data=two;
run;
Hi:
The issue you're going to run into is that the last OUTPUT statement will also write observations to one and two. So you'll end up with MORE observations in ONE and TWO than you might intend. It depends on what your intention is. Consider this output and debugging version of the program shown below. In the second output and the code, the HOW_OUT variable shows you exactly HOW each observation was written to each output file.
Cynthia
When I run a version of your program (to eliminate the confusion of having data two and set two, I started with a data set called 'fakedata'), this is what I get:
This is how each obs got into the output files -- notice the new variable called "HOW_OUT" which shows exactly which statement wrote the obs to the file:
using this code
data fakedata;
input x y;
datalines;
5 2
3 1
5 6
;
run;
data one two other;
length x 8 y 8 how_out $14;
set fakedata;
if x = 5 then do; how_out='if x = 5'; output one; end;
if y < 5 then do; how_out='if y < 5'; output two; end;
how_out='final output';
output;
run;
proc print data=fakedata noobs;
title '0) starting with work.fakedata';
run;
proc print data=one noobs;
title '1) what is in work.one';
run;
proc print data=two noobs;
title '2) what is in work.two';
run;
proc print data=other noobs;
title '3) what is in work.other';
run;
Hi,
That really was something I consider a best practice. In my world, it is not good to do this:
data mydata;
set mydata;
... more code ...;
run;
Because that makes it impossible to separate the INPUT data (on the SET statement) from the OUTPUT data (on the DATA statement) and could result in the loss of the INPUT data if you have any fatal errors in your code.
I ALWAYS recommend to my students that they avoid the temptation to use the same name on both their DATA and SET statements.
Cynthia
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.