Data step iterations when multiple set statements are used.

Reply
Occasional Contributor
Posts: 17

Data step iterations when multiple set statements are used.

Hi Experts,

 

I have 2 data sets as long and short.

  

data long;

        input x;

        datalines;

1

2

3

4

5

;

run;

 

data short;

        input x;

        datalines;

12

14

;

run;

 

when I concatenate the above 2 datasets:

 

data sample;

        set long;

        output;

        set short;

        output;

run;

 

The o/p being generated is:

1

12

2

14

3

 

But if the number of D.S. iterations are 2 then why is 3 getting outputted to the sample dataset. Please explain.

 

Regards !!

Gurpreet

Grand Advisor
Posts: 10,210

Re: Data step iterations when multiple set statements are used.

Terminology: You are interleaving (take 1 record from one set, then the another record from a different set, repeat).

Concatenate "adds" on set to the end of another and the code would be:

 

data sample;

   set long

         short

  ;

end;

and would have all records from both sets.

 

You force an output at each read, so the third iteration for the first data set gets included after reading. The second data set has no more records to contribute to the process, so stops.

 

For additional entertainment value examine:

data sample; 
merge long short;
run; data sample2; merge short long ; run;
Respected Advisor
Posts: 4,966

Re: Data step iterations when multiple set statements are used.

The number of iterations of the DATA step depends indirectly on the number of observations in your data set.  The actual process is that the DATA step automatically continues until a SET statement fails because it tries to read past the end of the data.  You can get a small indication of this by adding PUT statements to a simple DATA step:

 

data test;

put 'Before:  ' _all_;

set short;

put 'After:  ' _all_;

run;

 

It pays to set up your own tests, and keep playing with this until you get the idea.  Here is another test you can run:

 

data test;

do _k_=1 to 3;

   put 'Before:  ' _all_;

   set long;

   put 'After:  ' _all_;

end;

run;

 

You could also compare these two:

 

data test;

set long; output;

set short; output;

run;

 

data test;

set short; output;

set long; output;

run;

 

Good luck.

Ask a Question
Discussion stats
  • 2 replies
  • 208 views
  • 1 like
  • 3 in conversation