<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Question about data step using where= and drop in SAS Procedures</title>
    <link>https://communities.sas.com/t5/SAS-Procedures/Question-about-data-step-using-where-and-drop/m-p/66496#M19002</link>
    <description>Hi:&lt;BR /&gt;
  The DROP statement applies to the OUTPUT data sets. The error message says it all:&lt;BR /&gt;
ERROR: Variable Department is not on file WORK.SALES.&lt;BR /&gt;
  &lt;BR /&gt;
In the first program, the WHERE DATA set option is being applied to the OUTPUT dataset. The DROP statement impacts the OUTPUT dataset. Even though Department is in the INPUT dataset -- you ask for the WHERE to be applied to the OUTPUT dataset. And you gave explicit instructions (DROP) to keep Department out of the output dataset. Essentially, you ask for the WHERE condition to be applied using a variable that is just NOT in the OUTPUT dataset.&lt;BR /&gt;
&lt;BR /&gt;
In the second program, Department is DROPPED from the output dataset. But the IF statement is acting on Department in the Program Data Vector (PDV) &lt;B&gt;as it comes from the INPUT dataset&lt;/B&gt;. So the IF statement is working with the INPUT dataset.&lt;BR /&gt;
&lt;BR /&gt;
You can DROP Product from the OUTPUT dataset and still use it from the INPUT dataset with this code. The WHERE on the INPUT dataset limits the input rows to ONLY the BED or SOFA rows:&lt;BR /&gt;
[pre]&lt;BR /&gt;
data bed1 sofa1;&lt;BR /&gt;
set sashelp.prdsale(where=(Product in ('BED', 'SOFA')));&lt;BR /&gt;
if product = 'BED' then output bed1;&lt;BR /&gt;
else if product = 'SOFA' then output sofa1;&lt;BR /&gt;
drop Product;&lt;BR /&gt;
run;&lt;BR /&gt;
[/pre]&lt;BR /&gt;
                 &lt;BR /&gt;
However, you might want to benchmark the different techniques using your data. As you can see from my log (Windows), the second program with the IF statements took less CPU time than the first program (with the WHERE and the IF statements). (The SQL approach is shown in the SAS Log, too, for comparison purposes.)&lt;BR /&gt;
   &lt;BR /&gt;
cynthia&lt;BR /&gt;
      &lt;BR /&gt;
[pre]&lt;BR /&gt;
436  data bed1 sofa1;&lt;BR /&gt;
437  set sashelp.prdsale(where=(Product in ('BED', 'SOFA')));&lt;BR /&gt;
438  if product = 'BED' then output bed1;&lt;BR /&gt;
439  else if product = 'SOFA' then output sofa1;&lt;BR /&gt;
440  drop Product;&lt;BR /&gt;
441  run;&lt;BR /&gt;
                    &lt;BR /&gt;
NOTE: There were 576 observations read from the data set SASHELP.PRDSALE.&lt;BR /&gt;
      WHERE Product in ('BED', 'SOFA');&lt;BR /&gt;
NOTE: The data set WORK.BED1 has 288 observations and 9 variables.&lt;BR /&gt;
NOTE: The data set WORK.SOFA1 has 288 observations and 9 variables.&lt;BR /&gt;
NOTE: DATA statement used (Total process time):&lt;BR /&gt;
      real time           0.03 seconds&lt;BR /&gt;
      cpu time            0.03 seconds&lt;BR /&gt;
               &lt;BR /&gt;
                &lt;BR /&gt;
442&lt;BR /&gt;
443  data bed2 sofa2;&lt;BR /&gt;
444  set sashelp.prdsale;&lt;BR /&gt;
445  if Product='BED' then output bed2;&lt;BR /&gt;
446  else if Product='SOFA' then output sofa2;&lt;BR /&gt;
447  drop Product;&lt;BR /&gt;
448  run;&lt;BR /&gt;
                   &lt;BR /&gt;
NOTE: There were 1440 observations read from the data set SASHELP.PRDSALE.&lt;BR /&gt;
NOTE: The data set WORK.BED2 has 288 observations and 9 variables.&lt;BR /&gt;
NOTE: The data set WORK.SOFA2 has 288 observations and 9 variables.&lt;BR /&gt;
NOTE: DATA statement used (Total process time):&lt;BR /&gt;
      real time           0.01 seconds&lt;BR /&gt;
      cpu time            0.01 seconds&lt;BR /&gt;
                        &lt;BR /&gt;
449  proc sql;&lt;BR /&gt;
450    create table bed3 as&lt;BR /&gt;
451    select country, region, division, prodtype,&lt;BR /&gt;
452           quarter, year, month, actual, predict&lt;BR /&gt;
453    from sashelp.prdsale&lt;BR /&gt;
454    where Product = 'BED';&lt;BR /&gt;
NOTE: Table WORK.BED3 created, with 288 rows and 9 columns.&lt;BR /&gt;
                                           &lt;BR /&gt;
455&lt;BR /&gt;
456    create table sofa3 as&lt;BR /&gt;
457    select country, region, division, prodtype,&lt;BR /&gt;
458           quarter, year, month, actual, predict&lt;BR /&gt;
459    from sashelp.prdsale&lt;BR /&gt;
460    where Product = 'SOFA';&lt;BR /&gt;
NOTE: Table WORK.SOFA3 created, with 288 rows and 9 columns.&lt;BR /&gt;
                         &lt;BR /&gt;
461  quit;&lt;BR /&gt;
NOTE: PROCEDURE SQL used (Total process time):&lt;BR /&gt;
      real time           0.15 seconds&lt;BR /&gt;
      cpu time            0.03 seconds&lt;BR /&gt;
&lt;BR /&gt;
                 &lt;BR /&gt;
[/pre]</description>
    <pubDate>Thu, 18 Dec 2008 15:58:17 GMT</pubDate>
    <dc:creator>Cynthia_sas</dc:creator>
    <dc:date>2008-12-18T15:58:17Z</dc:date>
    <item>
      <title>Question about data step using where= and drop</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/Question-about-data-step-using-where-and-drop/m-p/66494#M19000</link>
      <description>Why are the following two data steps not equivalent? The first gives an error about the Department variable not being present in sales.&lt;BR /&gt;
&lt;BR /&gt;
data sales (where=(Department='Sales')) exec (where=(Department='Executives'));&lt;BR /&gt;
	set orion.employee_organization;&lt;BR /&gt;
	drop Department;&lt;BR /&gt;
run;&lt;BR /&gt;
&lt;BR /&gt;
&lt;BR /&gt;
data sales exec;&lt;BR /&gt;
	set orion.employee_organization;&lt;BR /&gt;
	if Department='Sales' then output sales;&lt;BR /&gt;
	else if Department='Executives' then output exec;&lt;BR /&gt;
	drop Department;&lt;BR /&gt;
run;

Message was edited by: MichielBorkent</description>
      <pubDate>Thu, 18 Dec 2008 12:59:31 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/Question-about-data-step-using-where-and-drop/m-p/66494#M19000</guid>
      <dc:creator>MichielBorkent</dc:creator>
      <dc:date>2008-12-18T12:59:31Z</dc:date>
    </item>
    <item>
      <title>Re: Question about data step using where= and drop</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/Question-about-data-step-using-where-and-drop/m-p/66495#M19001</link>
      <description>Hi,&lt;BR /&gt;
&lt;BR /&gt;
In the first case the first thing SAS does after reading in data set orion.employee_organization is that he drops variable department. When you try to reference it for data sets sales and exec, this variable is no longer available. That is because WHERE option is executed after drop statement.&lt;BR /&gt;
&lt;BR /&gt;
In the second case, condition is set when SAS still knows what the department is. &lt;BR /&gt;
&lt;BR /&gt;
BR,&lt;BR /&gt;
I.</description>
      <pubDate>Thu, 18 Dec 2008 15:16:47 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/Question-about-data-step-using-where-and-drop/m-p/66495#M19001</guid>
      <dc:creator>ieva</dc:creator>
      <dc:date>2008-12-18T15:16:47Z</dc:date>
    </item>
    <item>
      <title>Re: Question about data step using where= and drop</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/Question-about-data-step-using-where-and-drop/m-p/66496#M19002</link>
      <description>Hi:&lt;BR /&gt;
  The DROP statement applies to the OUTPUT data sets. The error message says it all:&lt;BR /&gt;
ERROR: Variable Department is not on file WORK.SALES.&lt;BR /&gt;
  &lt;BR /&gt;
In the first program, the WHERE DATA set option is being applied to the OUTPUT dataset. The DROP statement impacts the OUTPUT dataset. Even though Department is in the INPUT dataset -- you ask for the WHERE to be applied to the OUTPUT dataset. And you gave explicit instructions (DROP) to keep Department out of the output dataset. Essentially, you ask for the WHERE condition to be applied using a variable that is just NOT in the OUTPUT dataset.&lt;BR /&gt;
&lt;BR /&gt;
In the second program, Department is DROPPED from the output dataset. But the IF statement is acting on Department in the Program Data Vector (PDV) &lt;B&gt;as it comes from the INPUT dataset&lt;/B&gt;. So the IF statement is working with the INPUT dataset.&lt;BR /&gt;
&lt;BR /&gt;
You can DROP Product from the OUTPUT dataset and still use it from the INPUT dataset with this code. The WHERE on the INPUT dataset limits the input rows to ONLY the BED or SOFA rows:&lt;BR /&gt;
[pre]&lt;BR /&gt;
data bed1 sofa1;&lt;BR /&gt;
set sashelp.prdsale(where=(Product in ('BED', 'SOFA')));&lt;BR /&gt;
if product = 'BED' then output bed1;&lt;BR /&gt;
else if product = 'SOFA' then output sofa1;&lt;BR /&gt;
drop Product;&lt;BR /&gt;
run;&lt;BR /&gt;
[/pre]&lt;BR /&gt;
                 &lt;BR /&gt;
However, you might want to benchmark the different techniques using your data. As you can see from my log (Windows), the second program with the IF statements took less CPU time than the first program (with the WHERE and the IF statements). (The SQL approach is shown in the SAS Log, too, for comparison purposes.)&lt;BR /&gt;
   &lt;BR /&gt;
cynthia&lt;BR /&gt;
      &lt;BR /&gt;
[pre]&lt;BR /&gt;
436  data bed1 sofa1;&lt;BR /&gt;
437  set sashelp.prdsale(where=(Product in ('BED', 'SOFA')));&lt;BR /&gt;
438  if product = 'BED' then output bed1;&lt;BR /&gt;
439  else if product = 'SOFA' then output sofa1;&lt;BR /&gt;
440  drop Product;&lt;BR /&gt;
441  run;&lt;BR /&gt;
                    &lt;BR /&gt;
NOTE: There were 576 observations read from the data set SASHELP.PRDSALE.&lt;BR /&gt;
      WHERE Product in ('BED', 'SOFA');&lt;BR /&gt;
NOTE: The data set WORK.BED1 has 288 observations and 9 variables.&lt;BR /&gt;
NOTE: The data set WORK.SOFA1 has 288 observations and 9 variables.&lt;BR /&gt;
NOTE: DATA statement used (Total process time):&lt;BR /&gt;
      real time           0.03 seconds&lt;BR /&gt;
      cpu time            0.03 seconds&lt;BR /&gt;
               &lt;BR /&gt;
                &lt;BR /&gt;
442&lt;BR /&gt;
443  data bed2 sofa2;&lt;BR /&gt;
444  set sashelp.prdsale;&lt;BR /&gt;
445  if Product='BED' then output bed2;&lt;BR /&gt;
446  else if Product='SOFA' then output sofa2;&lt;BR /&gt;
447  drop Product;&lt;BR /&gt;
448  run;&lt;BR /&gt;
                   &lt;BR /&gt;
NOTE: There were 1440 observations read from the data set SASHELP.PRDSALE.&lt;BR /&gt;
NOTE: The data set WORK.BED2 has 288 observations and 9 variables.&lt;BR /&gt;
NOTE: The data set WORK.SOFA2 has 288 observations and 9 variables.&lt;BR /&gt;
NOTE: DATA statement used (Total process time):&lt;BR /&gt;
      real time           0.01 seconds&lt;BR /&gt;
      cpu time            0.01 seconds&lt;BR /&gt;
                        &lt;BR /&gt;
449  proc sql;&lt;BR /&gt;
450    create table bed3 as&lt;BR /&gt;
451    select country, region, division, prodtype,&lt;BR /&gt;
452           quarter, year, month, actual, predict&lt;BR /&gt;
453    from sashelp.prdsale&lt;BR /&gt;
454    where Product = 'BED';&lt;BR /&gt;
NOTE: Table WORK.BED3 created, with 288 rows and 9 columns.&lt;BR /&gt;
                                           &lt;BR /&gt;
455&lt;BR /&gt;
456    create table sofa3 as&lt;BR /&gt;
457    select country, region, division, prodtype,&lt;BR /&gt;
458           quarter, year, month, actual, predict&lt;BR /&gt;
459    from sashelp.prdsale&lt;BR /&gt;
460    where Product = 'SOFA';&lt;BR /&gt;
NOTE: Table WORK.SOFA3 created, with 288 rows and 9 columns.&lt;BR /&gt;
                         &lt;BR /&gt;
461  quit;&lt;BR /&gt;
NOTE: PROCEDURE SQL used (Total process time):&lt;BR /&gt;
      real time           0.15 seconds&lt;BR /&gt;
      cpu time            0.03 seconds&lt;BR /&gt;
&lt;BR /&gt;
                 &lt;BR /&gt;
[/pre]</description>
      <pubDate>Thu, 18 Dec 2008 15:58:17 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/Question-about-data-step-using-where-and-drop/m-p/66496#M19002</guid>
      <dc:creator>Cynthia_sas</dc:creator>
      <dc:date>2008-12-18T15:58:17Z</dc:date>
    </item>
    <item>
      <title>Re: Question about data step using where= and drop</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/Question-about-data-step-using-where-and-drop/m-p/66497#M19003</link>
      <description>Hello ieva and Cynthia. Thanks for your answers. I was confused about when the where statement is about the input or the output. But hey, it's only my first week of SAS ;-). I also found this in the help which clears up things a lot. &lt;BR /&gt;
&lt;BR /&gt;
Example 1: Selecting Observations from an Input Data Set &lt;BR /&gt;
This example uses the WHERE= data set option to subset the SALES data set as it is read into another data set: &lt;BR /&gt;
&lt;BR /&gt;
data whizmo;&lt;BR /&gt;
   set sales(where=(product='whizmo'));&lt;BR /&gt;
run;&lt;BR /&gt;
&lt;BR /&gt;
Example 2: Selecting Observations from an Output Data Set &lt;BR /&gt;
This example uses the WHERE= data set option to subset the SALES &lt;B&gt;output&lt;/B&gt; data set: &lt;BR /&gt;
&lt;BR /&gt;
data whizmo(where=(product='whizmo'));&lt;BR /&gt;
   set sales;&lt;BR /&gt;
run;&lt;BR /&gt;
&lt;BR /&gt;
I think it is safe to assume according to the help that the WHERE statement is always about the input set.&lt;BR /&gt;
&lt;BR /&gt;
Furthermore it is kind of tricky to know when drop= and where= are applied. From the help I cite:&lt;BR /&gt;
&lt;BR /&gt;
"Do not confuse the purpose of the WHERE= data set option. The DROP= and KEEP= data set options select variables for processing, while the WHERE= data set option selects observations."

Message was edited by: Michiel Borkent</description>
      <pubDate>Fri, 19 Dec 2008 09:18:02 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/Question-about-data-step-using-where-and-drop/m-p/66497#M19003</guid>
      <dc:creator>MichielBorkent</dc:creator>
      <dc:date>2008-12-19T09:18:02Z</dc:date>
    </item>
  </channel>
</rss>

