turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- SAS Programming
- /
- Base SAS Programming
- /
- Multiple set statements

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-03-2015 06:40 AM

I ran the code below in EG 7.1 to understand how multiple set statements works. But I could not understand why data step (whilst creating data set c) read only 2 observations from dataset a although it has 3 observations. See the log which is highlighted in red color.

Also can I assume that we will get the output in final datastep which holds records with least number of observations in set statements? e.g. A table has 1 obs, B table has 2 obs, C table has 3 obs then D table (data d;set a;

set b;set c;run should produce 1 obs which is similar to A table. (?)

data a;

b="a";output;

b="b";output;

b="f";output;

run;

data b;

b="c";output;

b="d";output;

run;

data d;

b="e";output;

run;

data c;

set b;

set a;

set d;

run;

Log:

23 data a;

24 b="a";output;

25 b="b";output;

26 b="f";output;

27 run;

NOTE: Compression was disabled for data set WORK.A because compression overhead would increase the size of the data set.

NOTE: The data set WORK.A has 3 observations and 1 variables.

NOTE: DATA statement used (Total process time):

real time 0.00 seconds

cpu time 0.00 seconds

28

29

30 data b;

31 b="c";output;

32 b="d";output;

33 run;

NOTE: Compression was disabled for data set WORK.B because compression overhead would increase the size of the data set.

NOTE: The data set WORK.B has 2 observations and 1 variables.

NOTE: DATA statement used (Total process time):

real time 0.00 seconds

cpu time 0.00 seconds

34

35 data d;

36 b="e";output;

37

38 run;

NOTE: Compression was disabled for data set WORK.D because compression overhead would increase the size of the data set.

NOTE: The data set WORK.D has 1 observations and 1 variables.

NOTE: DATA statement used (Total process time):

real time 0.00 seconds

cpu time 0.00 seconds

39

40 data c;

41 set b;

42 set a;

43 set d;

44 run;

NOTE: Compression was disabled for data set WORK.C because compression overhead would increase the size of the data set.

NOTE: There were 2 observations read from the data set WORK.B.

NOTE: There were 2 observations read from the data set WORK.A.

NOTE: There were 1 observations read from the data set WORK.D.

NOTE: The data set WORK.C has 1 observations and 1 variables.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-03-2015 06:55 AM

Perhaps you should first consider why WORK.C has only 1 observation.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-03-2015 07:07 AM

That's also one off my question.

*Also can I assume that we will get the output in final datastep which holds records with least number of observations in set statements? e.g. A table has 1 obs, B table has 2 obs, C table has 3 obs then D table (data d;set a;*

*set b;set c;run should produce 1 obs which is similar to A table. (?)*

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-03-2015 07:46 AM

Test. Consider you first example with 1 obs output. Which data set did the value of B come from.

Your example is somewhat confounded by the fact that the variable(s) are the same in all. The values are being overwritten before anything is output. If you could run the data step debugger you would have good way to study exactly what is happening but you must have DMS SAS no EG/Batch.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-03-2015 08:01 AM

Dataset c returns the value for b as e. It is coming from the dataset d.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-03-2015 08:05 AM

Yes. At the first data step iteration:

1. SAS reads table b, b="c" - but no output to table c

2. SAS reads table a, b="a" - but no output to table c

1. SAS reads table b, b="e" - now there is an output to table c

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-03-2015 08:25 AM

Yes from data D. Explain why that is.

To better understand add a variable to each data set with a unique name.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-03-2015 07:51 AM

A data step exits **immediately** if it executes a set statement, but there are no more observations in the data set.

In the second data step iteration (_N_=2) SAS reads table b, than table a, and then tries to read table d, which has no more rows. -> exit (stop; ) , no implicit output.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-03-2015 08:02 AM

By the way, this is the usual way, how a data step terminates. Run this code:

data class2;

putlog 'BEFORE ' _all_; /*_N_=20 is printed here*/

set sashelp.class; /*exit here, when _N_=20 - no more obs*/

putlog 'AFTER' _all_;

run;

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-03-2015 08:06 AM

I'm sorry, still I couldn't understand.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-03-2015 08:03 AM

Are you saying that output will be from the first set statement?

May I request you to explain it with small example.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-03-2015 08:09 AM

I augmented your code, so you can see in the log, what is happening.

I also put an **output** statement at the end of the data step (this changes nothing in this case, just instead of implicit output, we use explicit output)

data c;

putlog '1:' _all_;

set b;

putlog '2:' _all_;

set a;

putlog '3:' _all_;

set d;

putlog '4:' _all_;

putlog 'OUTPUT will be here';

**output**;

run;

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-03-2015 08:16 AM

Thanks.

So the table which holds minimum number of observations from the list of set statements will be returned as output.

*e.g. A table has 1 obs, B table has 2 obs, C table has 3 obs then D table (data d;set a;**set b;set c;run should produce 1 obs which is similar to A table. (?)*

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-03-2015 08:26 AM

Babloo wrote:

Thanks.

So the table which holds minimum number of observations from the list of set statements will be returned as output.

NO, that is not the correct way to understand what is happening.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-03-2015 08:46 AM

No. The input table with the least number of observations determines after how many iterations the data step stops (and how many observations are written).

The output is determined by the last statement that filled the output data vector before the implicit write at the end of the data step iteration is reached.

So, in your first example, your variable b is ALWAYS filled with values from dataset d (because set d; is the last statement that alters the value of b prior to the write). Your output table will NEVER have values of tables a or b.

In your second example (*e.g. A table has 1 obs, B table has 2 obs, C table has 3 obs then D table (data d;set a;**set b;set c;run should produce 1 obs which is similar to A table. (?))*, the values always come from table c, but table a determines the number of iterations (1).

---------------------------------------------------------------------------------------------

Maxims of Maximally Efficient SAS Programmers

Maxims of Maximally Efficient SAS Programmers