Solved: set a dataset many times within a data step

ShufeGuoding · Posted 10-06-2018 10:35 AM

What's the difference between the results from two data step in the following?

data s;
do n=1 to 2;
set sashelp.class;
end;
run;

data s;
set sashelp.class;
set sashelp.class;

output;
run;

mkeintz · Posted 10-06-2018 10:51 AM

The is because the sas compiler sets up one data stream for each instance of a SET statement. In the first program there is one set statement, therefore one stream. It's executed twice per each iteration of the data step, giving you observation numbers 2,4,6,8,10,12,14,16, and 18 (9 obs).

In the second program there are two streams, each executed once per iteration of the data step. In your example, in which both streams come from the same data source (and therefore have common variables), the second stream values overwrite the values obtained from the first stream.

BTW, while two SET statements invoke two data streams, two INPUT statements read from the same raw data stream.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

VDD · Posted 10-06-2018 10:40 AM

the difference is as follows;

105 data s;
106 do n=1 to 2;
107 set sashelp.class;
108 end;
109 run;

NOTE: There were 19 observations read from the data set SASHELP.CLASS.
NOTE: The data set WORK.S has 9 observations and 6 variables.
NOTE: DATA statement used (Total process time):
real time 0.06 seconds
cpu time 0.00 seconds

110
111
112 data s;
113 set sashelp.class;
114 set sashelp.class;
115 output;
116 run;

NOTE: There were 19 observations read from the data set SASHELP.CLASS.
NOTE: There were 19 observations read from the data set SASHELP.CLASS.
NOTE: The data set WORK.S has 19 observations and 5 variables.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds

the second datastep may give you want you want but you did not describe your wants.

why are you assigning the same data table 2 times in the same datastep?

ShufeGuoding · Posted 10-06-2018 10:43 AM

why?

VDD · Posted 10-06-2018 10:47 AM

the why is because the code is incorrect.

do this if you want to dupe the table rows

data s;
set sashelp.class sashelp.class;
output;
run;

117
118 data s;
119 set sashelp.class sashelp.class;
120 output;
121 run;

NOTE: There were 19 observations read from the data set SASHELP.CLASS.
NOTE: There were 19 observations read from the data set SASHELP.CLASS.
NOTE: The data set WORK.S has 38 observations and 5 variables.
NOTE: DATA statement used (Total process time):
real time 0.06 seconds
cpu time 0.01 seconds

try using /debug and you would see that in the first datastep the output is only done after exiting the do loop. so you are getting records 2,4,6.....

in your second datastep you have to set statements. the code here as set your_table your_table works based on your example. The question is why do you want 2 rows with the same information in the same table?

ShufeGuoding · Posted 10-06-2018 08:55 PM

My intention is to understand the way in which the data step read the obervations in input data set. I just wander why the different obervations are read sequentially in the do loop in one data step iteration in the first code , but it did not in the second code. Thanks a lot!

mkeintz · Posted 10-06-2018 10:51 AM

The is because the sas compiler sets up one data stream for each instance of a SET statement. In the first program there is one set statement, therefore one stream. It's executed twice per each iteration of the data step, giving you observation numbers 2,4,6,8,10,12,14,16, and 18 (9 obs).

In the second program there are two streams, each executed once per iteration of the data step. In your example, in which both streams come from the same data source (and therefore have common variables), the second stream values overwrite the values obtained from the first stream.

BTW, while two SET statements invoke two data streams, two INPUT statements read from the same raw data stream.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

VDD · Posted 10-06-2018 11:25 AM

/debug can be of assistance when you have a why.

step the records through the process and see what is happening by using the debugger.

data s / debug;
do n=1 to 2;
set sashelp.class;
end;
run;
 
 
data s / debug;
set sashelp.class;
set sashelp.class;
output;
run;

FreelanceReinh · Posted 10-06-2018 05:21 PM

@ShufeGuoding: If you're not familiar with the data step debugger (and its somewhat cryptic commands), here is a brief instruction: https://communities.sas.com/t5/SAS-Programming/use-of-index/m-p/264460#M51865

set a dataset many times within a data step

Re: set a dataset many times within a data step

Re: set a dataset many times within a data step

Re: set a dataset many times within a data step

Re: set a dataset many times within a data step

Re: set a dataset many times within a data step

Re: set a dataset many times within a data step

Re: set a dataset many times within a data step

Re: set a dataset many times within a data step

SAS Innovate 2025: Call for Content

Classroom Training Available!