BookmarkSubscribeRSS Feed
MichielBorkent
Calcite | Level 5
Why are the following two data steps not equivalent? The first gives an error about the Department variable not being present in sales.

data sales (where=(Department='Sales')) exec (where=(Department='Executives'));
set orion.employee_organization;
drop Department;
run;


data sales exec;
set orion.employee_organization;
if Department='Sales' then output sales;
else if Department='Executives' then output exec;
drop Department;
run; Message was edited by: MichielBorkent
3 REPLIES 3
ieva
Pyrite | Level 9
Hi,

In the first case the first thing SAS does after reading in data set orion.employee_organization is that he drops variable department. When you try to reference it for data sets sales and exec, this variable is no longer available. That is because WHERE option is executed after drop statement.

In the second case, condition is set when SAS still knows what the department is.

BR,
I.
Cynthia_sas
SAS Super FREQ
Hi:
The DROP statement applies to the OUTPUT data sets. The error message says it all:
ERROR: Variable Department is not on file WORK.SALES.

In the first program, the WHERE DATA set option is being applied to the OUTPUT dataset. The DROP statement impacts the OUTPUT dataset. Even though Department is in the INPUT dataset -- you ask for the WHERE to be applied to the OUTPUT dataset. And you gave explicit instructions (DROP) to keep Department out of the output dataset. Essentially, you ask for the WHERE condition to be applied using a variable that is just NOT in the OUTPUT dataset.

In the second program, Department is DROPPED from the output dataset. But the IF statement is acting on Department in the Program Data Vector (PDV) as it comes from the INPUT dataset. So the IF statement is working with the INPUT dataset.

You can DROP Product from the OUTPUT dataset and still use it from the INPUT dataset with this code. The WHERE on the INPUT dataset limits the input rows to ONLY the BED or SOFA rows:
[pre]
data bed1 sofa1;
set sashelp.prdsale(where=(Product in ('BED', 'SOFA')));
if product = 'BED' then output bed1;
else if product = 'SOFA' then output sofa1;
drop Product;
run;
[/pre]

However, you might want to benchmark the different techniques using your data. As you can see from my log (Windows), the second program with the IF statements took less CPU time than the first program (with the WHERE and the IF statements). (The SQL approach is shown in the SAS Log, too, for comparison purposes.)

cynthia

[pre]
436 data bed1 sofa1;
437 set sashelp.prdsale(where=(Product in ('BED', 'SOFA')));
438 if product = 'BED' then output bed1;
439 else if product = 'SOFA' then output sofa1;
440 drop Product;
441 run;

NOTE: There were 576 observations read from the data set SASHELP.PRDSALE.
WHERE Product in ('BED', 'SOFA');
NOTE: The data set WORK.BED1 has 288 observations and 9 variables.
NOTE: The data set WORK.SOFA1 has 288 observations and 9 variables.
NOTE: DATA statement used (Total process time):
real time 0.03 seconds
cpu time 0.03 seconds


442
443 data bed2 sofa2;
444 set sashelp.prdsale;
445 if Product='BED' then output bed2;
446 else if Product='SOFA' then output sofa2;
447 drop Product;
448 run;

NOTE: There were 1440 observations read from the data set SASHELP.PRDSALE.
NOTE: The data set WORK.BED2 has 288 observations and 9 variables.
NOTE: The data set WORK.SOFA2 has 288 observations and 9 variables.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds

449 proc sql;
450 create table bed3 as
451 select country, region, division, prodtype,
452 quarter, year, month, actual, predict
453 from sashelp.prdsale
454 where Product = 'BED';
NOTE: Table WORK.BED3 created, with 288 rows and 9 columns.

455
456 create table sofa3 as
457 select country, region, division, prodtype,
458 quarter, year, month, actual, predict
459 from sashelp.prdsale
460 where Product = 'SOFA';
NOTE: Table WORK.SOFA3 created, with 288 rows and 9 columns.

461 quit;
NOTE: PROCEDURE SQL used (Total process time):
real time 0.15 seconds
cpu time 0.03 seconds


[/pre]
MichielBorkent
Calcite | Level 5
Hello ieva and Cynthia. Thanks for your answers. I was confused about when the where statement is about the input or the output. But hey, it's only my first week of SAS ;-). I also found this in the help which clears up things a lot.

Example 1: Selecting Observations from an Input Data Set
This example uses the WHERE= data set option to subset the SALES data set as it is read into another data set:

data whizmo;
set sales(where=(product='whizmo'));
run;

Example 2: Selecting Observations from an Output Data Set
This example uses the WHERE= data set option to subset the SALES output data set:

data whizmo(where=(product='whizmo'));
set sales;
run;

I think it is safe to assume according to the help that the WHERE statement is always about the input set.

Furthermore it is kind of tricky to know when drop= and where= are applied. From the help I cite:

"Do not confuse the purpose of the WHERE= data set option. The DROP= and KEEP= data set options select variables for processing, while the WHERE= data set option selects observations." Message was edited by: Michiel Borkent

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 1014 views
  • 0 likes
  • 3 in conversation