BookmarkSubscribeRSS Feed
victorleehc0
Calcite | Level 5

data c1;
set sashelp.class;
set sashelp.class;
run;

 

data c2;
  do i=1 to 2;
   set sashelp.class;
 end;
run;

 

data c1 is simply a copy of sashelp.class, while dataset c2 contains only the even numbered observations of sashelp.class. Is there anyone please explain the machanism of set statements and why the two programs are different.

 

Thank you so much!!

5 REPLIES 5
Tom
Super User Tom
Super User

If your data step does not include an OUTPUT statement there is an implied OUTPUT added at the end of the step.

 

So your first one reads the first observation twice and then writes the second copy out.

 

You second one read two observations before writing anything. So the values read from the first observation are replaced by the value read from the second before anything is written out.

PaigeMiller
Diamond | Level 26

A data step is an implied loop, it will execute (as in a loop) for all records in the data set in the SET statement. So its not clear why you want a DO loop here, could you please explain?

--
Paige Miller
Ksharp
Super User

For each SET statement ,sas would assign a pointer to the dataset attached with SET.

Therefore, the first code have two pointer, the second code have one pointer.

And as Tom said, at the bottom of data step would have a implied OUTPUT statement as I showd in code.

So the two pointers in the first code both move one position or record within each data step iteration, and you could get the exact record with original dataset 'HAVE'.

the pointer in the second code would move TWO poistion/recode within each data step iteration,so you could get half of records of dataset 'HAVE'.

 

 

data have;
do obs=1 to 10;
output;
end;
run;

data c1;
set have;
set have;
/*output;*/
run;

 proc print;run;

data c2;
  do i=1 to 2;
   set have;
 end;
/*output;*/
run;
proc print;run;

Ksharp_0-1738029736556.png

 

 

mkeintz
PROC Star

In your first example (two SET statements), you have specified two (parallel) input data streams.  The second SET merely re-reads data already read by the first SET.  But there is no explicit output statement.  Output will therefore occur only when the RUN statement is encountered (i.e. after the re-read only), so you will just output the second copy of each input.  You will have simply duplicated the input data set.

 

In the second example, you have only one SET statement, and therefore only one input stream.  In the absence of a specific output statement, the first observation is replaced by the second before exiting the loop.  Again the output occurs only when the RUN is encountered.  So you will output only the 2nd, 4th, etc. observations, i.e. only the even-numbered observations.  

 

Bottom line: a SET statement does not imply an output.  There must either be an explicit output statement whenever you intend it to occur, or (only in the absence of an explicit output), you will have the implicit output when RUN is encountered.

 

Do you want the original data output in pairs?  One way is to use an explicit output as in:

data pair;
  set sashelp.class;
  do copy=1 to 2;
   output;
  end;
run;

Do you want one complete copy of the original dataset followed by a second copy?  Give two arguments to a single SET statement (not two SET statements), as in:

data repeat;
  set sashelp.class (in=in1) sashelp.class (in=in2);
  copy = in1 + 2*in2;
run;
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
ballardw
Super User

Maybe this will help a little bit with what happens in the C1 case. This creates two small data sets and uses them on the set statement. In this case you see both sets of values from the two data sets. Your C1, reading the exact same data set reads the value into the exact same variables so it appears as though not much is going on.

data work.one;
   input x $;
datalines;
a
b
c
d
;

data work.two;
  input y $;
datalines;
z
y
x
w
;

data work.ex1;
set work.one;
set work.two;
run;

 

Perhaps more interesting is what happens if One and Two have different numbers of observations...

hackathon24-white-horiz.png

The 2025 SAS Hackathon Kicks Off on June 11!

Watch the live Hackathon Kickoff to get all the essential information about the SAS Hackathon—including how to join, how to participate, and expert tips for success.

YouTube LinkedIn

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 912 views
  • 9 likes
  • 6 in conversation