DATA Step, Macro, Functions and more

using two SET statements to combine 2 datasets

Reply
Occasional Contributor
Posts: 11

using two SET statements to combine 2 datasets

Hi everyone,

 

I got a question about using 2 SETs to combine 2 datasets. My question came from the Little SAS Book (5th ed) on page 12/27: Combining a Grand Total with the Original Data.

 

 

 

To make it easier to understand, I copy the whole programming:

 

A distributor of athletic shoes is considering doing a special promotion for the top-selling styles. The vice-president of marketing asks you to produce a report showing the percentage of total sales for each style. For each style of shoe the raw data file contains the style name, type of exercise, and sales for the last quarter:

 

Max Flight      running 1930
Zip Fit Leather walking 2250
Zoom Airborne   running 4150
Light Step      walking 1130
Max Step Woven  walking 2230
Zip Sneak       c-train 1190

 

Here is the program:

 

DATA shoes;
INFILE 'c:\MyRawData\Shoesales.dat';
INPUT Style $ 1-15 ExerciseType $ Sales;
RUN;

 

* Output grand total of sales to a data set and print;
PROC MEANS NOPRINT DATA = shoes;
VAR Sales;
OUTPUT OUT = summarydata SUM(Sales) = GrandTotal;
RUN;
PROC PRINT DATA = summarydata;
TITLE 'Summary Data Set';
RUN;

 

* Combine the grand total with the original data;
DATA shoesummary;
IF _N_ = 1 THEN SET summarydata;
SET shoes;
Percent = Sales / GrandTotal * 100;
RUN;
PROC PRINT DATA = shoesummary;
VAR Style ExerciseType Sales GrandTotal Percent;
TITLE 'Overall Sales Share';
RUN;

----------------------

My question is in the last part (Combine the grand total with the original data):

 

DATA shoesummary;

IF _N_ = 1 THEN SET summarydata;
SET shoes;
Percent = Sales / GrandTotal * 100;

 

I do'nt quite understand why "IF _N_ = 1 THEN SET summarydata" is used here (although it works beautifully), why don't just simply use my simple version like this:

 

DATA shoesummary;

SET summarydata;
SET shoes;
Percent = Sales / GrandTotal * 100;

 

Of course, my simple version does not work. 

 

The dataset summarydata has only 1 line of observation, "IF _N_ = 1 THEN SET summarydata" will copy the whole dataset, my simple version will also copy the whole dataset. They are supposed to be the same, in my opinion, but actually NOT. Then what's the difference between them? is there any special reason to use "IF _N_ = 1 THEN SET summarydata" ?

 

(if this question sounds to be too silly, please forgive me)

 

Thanks.


SAS question.jpg
Super User
Posts: 5,495

Re: using two SET statements to combine 2 datasets

mich_ard,

 

Consider a simple data step like this:

 

data new;

set shoes;

amount=100;

run;

 

The key questions related to your example are how many times does the SET statement execute? And how does this DATA step end?

 

The SET statement does not read all the data.  Rather, it reads a single observation.  For that single observation, the DATA step computes AMOUNT and outputs the results.  Then the SET statement executes again and gets the second observation.  For the second observation the DATA step computes AMOUNT and outputs the result.  The SET statement executes many times, each time reading in a single observation.

 

Eventually, the SET statement fails.  It looks for another observation, and determines that there are no more.  That's a normal ending to a DATA step to have the SET statement fail.

 

Now consider your program:

 

data shoesummary;

set summarydata;

set shoes;

percent = ...;

run;

 

As this DATA step begins, the first set statement reads in the one and only observation from SUMMARYDATA.  Then the second observation reads in the first observation from SHOES.  The DATA step computes PERCENT and outputs the results.  Then the DATA step continues.  The first SET statement searches for a second observation in SUMMARYDATA.  It fails to find another observation, at which point the DATA step is over.  A SET statement has failed.  It doesn't matter that SHOES has not yet been fully read in ... it only matters that a SET statement has failed.

 

Trusted Advisor
Posts: 1,553

Re: using two SET statements to combine 2 datasets

using IF _N_ = 1  wiil read grandtotal into memory once and proceed with next code lines;

 

using set twice will read the summary dataet again and

wiil get to end-of-file   and grandtotal will be set to missing

Respected Advisor
Posts: 3,156

Re: using two SET statements to combine 2 datasets

Your question involved several fundamentals of data step. For details, the best instruction I found is from Ian Whitlock.

 

In short, 'if _n_=1' demands to read only 1 obs from 'summarydata' without triggering off the 'end of the file' event, so data step can keep moving on reading obs from 'shoes' until it is done. You may have already tested, without 'if _n_=1', data step will stop after outputting 1 obs.  

   

Ask a Question
Discussion stats
  • 3 replies
  • 368 views
  • 0 likes
  • 4 in conversation