<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Does WHERE Shuffle Observations? in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Does-WHERE-Shuffle-Observations/m-p/610873#M177983</link>
    <description>&lt;P&gt;When reading from a normal SAS dataset the values are read in order. So the WHERE statement will NOT change the order. But your code will not store anything into the metadata field(s) used to indicate what if any variables were used to sort the data.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Try it yourself.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Add a BY statement to your data step.&amp;nbsp; It will test if the values making it past the WHERE condition do not present themselves to the data step in a sorted order and fail if the are not ordered as indicated by the BY statement&amp;nbsp; If that happens then the metadata that generated the report you posted the photograph of is wrong.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;If you just want to subset the data you could use PROC SORT and that would definitely set the order and also store what variables were used to order the data into the metadata.&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;proc sort out= msf data=crsp.msf;
  where date&amp;lt;"1jan1930"d;
  by permno date;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Wed, 11 Dec 2019 00:05:56 GMT</pubDate>
    <dc:creator>Tom</dc:creator>
    <dc:date>2019-12-11T00:05:56Z</dc:date>
    <item>
      <title>Does WHERE Shuffle Observations?</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Does-WHERE-Shuffle-Observations/m-p/610824#M177947</link>
      <description>&lt;P&gt;I thought that WHERE subsetting does not affect the order of observations. For example, the following &lt;EM&gt;my&lt;/EM&gt; consists of 500 &lt;EM&gt;i&lt;/EM&gt;s with 10 &lt;EM&gt;t&lt;/EM&gt;s each. The WHERE in the second DATA picks &lt;EM&gt;t&lt;/EM&gt;=4,5,6,7 without distorting the order.&lt;/P&gt;&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data my;
do i=1 to 500;
do t=1 to 10;
output;
end;
end;
run;
data my;
set my;
where 3&amp;lt;t&amp;lt;8;
run;&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;I realized that the following WRDS example behaves differently when WHERE subsetting. The original data &lt;EM&gt;crsp.msf&lt;/EM&gt; is sorted by &lt;EM&gt;PERMNO&lt;/EM&gt; and &lt;EM&gt;DATE&lt;/EM&gt;, but the resulting &lt;EM&gt;msf&lt;/EM&gt; does not preserve the original order.&lt;/P&gt;&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;%let wrds=wrds.wharton.upenn.edu 4016;
signon wrds username=_prompt_;
rsubmit;
data msf;
set crsp.msf;
where date&amp;lt;"1jan1930"d;
run;
endrsubmit;&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;And I found that BY restores the original order—&lt;EM&gt;PERMNO&lt;/EM&gt; and &lt;EM&gt;DATE&lt;/EM&gt;. I wonder (1) why the resulting observations are shuffled here, and (2) whether BY after WHERE is necessary to make the observations sequential. Thanks in advance.&lt;/P&gt;</description>
      <pubDate>Tue, 10 Dec 2019 20:50:52 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Does-WHERE-Shuffle-Observations/m-p/610824#M177947</guid>
      <dc:creator>Junyong</dc:creator>
      <dc:date>2019-12-10T20:50:52Z</dc:date>
    </item>
    <item>
      <title>Re: Does WHERE Shuffle Observations?</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Does-WHERE-Shuffle-Observations/m-p/610825#M177948</link>
      <description>You cannot assume order of a data set unless you've explicitly sorted it. BY checks that the data set is sorted as expected and will return an error if it's not.</description>
      <pubDate>Tue, 10 Dec 2019 20:56:02 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Does-WHERE-Shuffle-Observations/m-p/610825#M177948</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2019-12-10T20:56:02Z</dc:date>
    </item>
    <item>
      <title>Re: Does WHERE Shuffle Observations?</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Does-WHERE-Shuffle-Observations/m-p/610830#M177953</link>
      <description>&lt;P&gt;If a dataset is stored in a SPDS library, there can be side-effects from the way the data is stored in the "buckets" and the bucket metadata is used for optimizing the where.&lt;/P&gt;</description>
      <pubDate>Tue, 10 Dec 2019 21:01:25 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Does-WHERE-Shuffle-Observations/m-p/610830#M177953</guid>
      <dc:creator>Kurt_Bremser</dc:creator>
      <dc:date>2019-12-10T21:01:25Z</dc:date>
    </item>
    <item>
      <title>Re: Does WHERE Shuffle Observations?</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Does-WHERE-Shuffle-Observations/m-p/610836#M177957</link>
      <description>&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="1.png" style="width: 460px;"&gt;&lt;img src="https://communities.sas.com/t5/image/serverpage/image-id/34604iCEEF218A5ADFAB05/image-size/large?v=v2&amp;amp;px=999" role="button" title="1.png" alt="1.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;It seems &lt;EM&gt;crsp.msf&lt;/EM&gt; is sorted by &lt;EM&gt;PERMNO&lt;/EM&gt; and &lt;EM&gt;DATE&lt;/EM&gt;. I wonder whether WHERE affects the sorting status.&lt;/P&gt;</description>
      <pubDate>Tue, 10 Dec 2019 21:12:41 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Does-WHERE-Shuffle-Observations/m-p/610836#M177957</guid>
      <dc:creator>Junyong</dc:creator>
      <dc:date>2019-12-10T21:12:41Z</dc:date>
    </item>
    <item>
      <title>Re: Does WHERE Shuffle Observations?</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Does-WHERE-Shuffle-Observations/m-p/610873#M177983</link>
      <description>&lt;P&gt;When reading from a normal SAS dataset the values are read in order. So the WHERE statement will NOT change the order. But your code will not store anything into the metadata field(s) used to indicate what if any variables were used to sort the data.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Try it yourself.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Add a BY statement to your data step.&amp;nbsp; It will test if the values making it past the WHERE condition do not present themselves to the data step in a sorted order and fail if the are not ordered as indicated by the BY statement&amp;nbsp; If that happens then the metadata that generated the report you posted the photograph of is wrong.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;If you just want to subset the data you could use PROC SORT and that would definitely set the order and also store what variables were used to order the data into the metadata.&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;proc sort out= msf data=crsp.msf;
  where date&amp;lt;"1jan1930"d;
  by permno date;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 11 Dec 2019 00:05:56 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Does-WHERE-Shuffle-Observations/m-p/610873#M177983</guid>
      <dc:creator>Tom</dc:creator>
      <dc:date>2019-12-11T00:05:56Z</dc:date>
    </item>
    <item>
      <title>Re: Does WHERE Shuffle Observations?</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Does-WHERE-Shuffle-Observations/m-p/610884#M177986</link>
      <description>&lt;P&gt;The sorting of the original data is correct by &lt;EM&gt;PERMNO&lt;/EM&gt; and &lt;EM&gt;DATE&lt;/EM&gt;. Here I attach the code and the screenshots.&lt;/P&gt;&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;rsubmit;
data msf;
set crsp.msf;
by permno date;/*&amp;lt;-this works correctly*/
run;
data msf1;/*&amp;lt;-the resulting msf1 is not sorted*/
set crsp.msf;
where date&amp;lt;"1jan1930"d;
run;
data msf2;/*&amp;lt;-the resulting msf2 is sorted*/
set crsp.msf;
where date&amp;lt;"1jan1930"d;
by permno date;
run;
endrsubmit;&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;The code above creates three data sets—&lt;EM&gt;MSF&lt;/EM&gt;, &lt;EM&gt;MSF1&lt;/EM&gt;, and &lt;EM&gt;MSF2&lt;/EM&gt;. The BY in the first DATA has no problem.&lt;/P&gt;&lt;P&gt;The only difference between &lt;EM&gt;MSF1&lt;/EM&gt; and &lt;EM&gt;MSF2&lt;/EM&gt; is the BY statement in the third DATA.&lt;/P&gt;&lt;PRE&gt;3    rsubmit;
NOTE: Remote submit to WRDS commencing.
1    data msf;
2    set crsp.msf;
3    by permno date;/*&amp;lt;-this works correctly*/
4    run;

NOTE: There were 4509846 observations read from the data set CRSP.MSF.
NOTE: The data set WORK.MSF has 4509846 observations and 21 variables.
NOTE: DATA statement used (Total process time):
      real time           1.61 seconds
      cpu time            1.52 seconds


5    data msf1;/*&amp;lt;-the resulting msf1 is not sorted*/
6    set crsp.msf;
7    where date&amp;lt;"1jan1930"d;
8    run;

NOTE: There were 29968 observations read from the data set CRSP.MSF.
      WHERE date&amp;lt;'01JAN1930'D;
NOTE: The data set WORK.MSF1 has 29968 observations and 21 variables.
NOTE: DATA statement used (Total process time):
      real time           0.68 seconds
      cpu time            0.43 seconds


9    data msf2;/*&amp;lt;-the resulting msf2 is sorted*/
10   set crsp.msf;
11   where date&amp;lt;"1jan1930"d;
12   by permno date;
13   run;

NOTE: There were 29968 observations read from the data set CRSP.MSF.
      WHERE date&amp;lt;'01JAN1930'D;
NOTE: The data set WORK.MSF2 has 29968 observations and 21 variables.
NOTE: DATA statement used (Total process time):
      real time           1.00 seconds
      cpu time            0.44 seconds


NOTE: Remote submit to WRDS complete.&lt;/PRE&gt;&lt;P&gt;The following is the resulting (not original) &lt;EM&gt;MSF&lt;/EM&gt;. The observations are sorted correctly by &lt;EM&gt;PERMNO&lt;/EM&gt; and &lt;EM&gt;DATE&lt;/EM&gt;.&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="1.png" style="width: 400px;"&gt;&lt;img src="https://communities.sas.com/t5/image/serverpage/image-id/34606iD576787D23319A54/image-size/medium?v=v2&amp;amp;px=400" role="button" title="1.png" alt="1.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;The following is &lt;EM&gt;MSF1&lt;/EM&gt; without BY. The &lt;EM&gt;PERMNO&lt;/EM&gt;-&lt;EM&gt;DATE&lt;/EM&gt; sorting disappears.&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="2.png" style="width: 400px;"&gt;&lt;img src="https://communities.sas.com/t5/image/serverpage/image-id/34605i96E656F2C48C8999/image-size/medium?v=v2&amp;amp;px=400" role="button" title="2.png" alt="2.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;The following is &lt;EM&gt;MSF2&lt;/EM&gt;&amp;nbsp;with BY. The &lt;EM&gt;PERMNO&lt;/EM&gt;-&lt;EM&gt;DATE&lt;/EM&gt; sorting is correct.&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="3.png" style="width: 400px;"&gt;&lt;img src="https://communities.sas.com/t5/image/serverpage/image-id/34607i508A9E0D242BEABA/image-size/medium?v=v2&amp;amp;px=400" role="button" title="3.png" alt="3.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;It seems SAS does not access to the original data sequentially if not BY.&lt;/P&gt;</description>
      <pubDate>Wed, 11 Dec 2019 01:48:00 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Does-WHERE-Shuffle-Observations/m-p/610884#M177986</guid>
      <dc:creator>Junyong</dc:creator>
      <dc:date>2019-12-11T01:48:00Z</dc:date>
    </item>
    <item>
      <title>Re: Does WHERE Shuffle Observations?</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Does-WHERE-Shuffle-Observations/m-p/610898#M177990</link>
      <description>&lt;P&gt;So the data is NOT sorted by those variables.&amp;nbsp; But perhaps you have an INDEX on those variables that will allow SAS to access the data in sorted order when you use the BY statement.&lt;/P&gt;</description>
      <pubDate>Wed, 11 Dec 2019 04:29:02 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Does-WHERE-Shuffle-Observations/m-p/610898#M177990</guid>
      <dc:creator>Tom</dc:creator>
      <dc:date>2019-12-11T04:29:02Z</dc:date>
    </item>
    <item>
      <title>Re: Does WHERE Shuffle Observations?</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Does-WHERE-Shuffle-Observations/m-p/610923#M178004</link>
      <description>&lt;P&gt;See this short example:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data class;
set sashelp.class;
run;

proc datasets nolist;
modify class;
index create sex;
quit;

data class1;
set class;
where age = 14;
by sex;
run;

data class2;
set class;
where age = 14;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Since there is an index file present for the msf dataset, you see the same kind of behaviour. The only thing that puzzles me is your output where it says sorted by PERMNO DATE. Since this is not the output of a proc contents, how did you get that?&lt;/P&gt;</description>
      <pubDate>Wed, 11 Dec 2019 08:53:08 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Does-WHERE-Shuffle-Observations/m-p/610923#M178004</guid>
      <dc:creator>Kurt_Bremser</dc:creator>
      <dc:date>2019-12-11T08:53:08Z</dc:date>
    </item>
    <item>
      <title>Re: Does WHERE Shuffle Observations?</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Does-WHERE-Shuffle-Observations/m-p/611050#M178037</link>
      <description>&lt;P&gt;I got the information from Details in Properties in Explorer.&lt;/P&gt;&lt;PRE&gt;                                                         The SAS System

                                                     The CONTENTS Procedure

          Data Set Name        CRSP.MSF                                                 Observations          4509846
          Member Type          DATA                                                     Variables             21
          Engine               V9                                                       Indexes               5
          Created              01/31/2019 15:39:40                                      Observation Length    168
          Last Modified        01/31/2019 15:39:48                                      Deleted Observations  0
          Protection                                                                    Compressed            NO
          Data Set Type                                                                 Sorted                YES
          Label                Monthly Stock - Securities
          Data Representation  SOLARIS_X86_64, LINUX_X86_64, ALPHA_TRU64, LINUX_IA64
          Encoding             latin1  Western (ISO)


                                               Engine/Host Dependent Information

                              Data Set Page Size          65536
                              Number of Data Set Pages    11595
                              First Data Page             1
                              Max Obs per Page            389
                              Obs in First Data Page      361
                              Index File Page Size        8192
                              Number of Index File Pages  40018
                              Number of Data Set Repairs  0
                              Filename                    /wrds/crsp/sasdata/a_stock/msf.sas7bdat
                              Release Created             9.0401M5
                              Host Created                Linux
                              Inode Number                2465424092
                              Access Permission           rw-r-----
                              Owner Name                  wrdsadmn
                              File Size                   725MB
                              File Size (bytes)           759955456


                                          Alphabetic List of Variables and Attributes

                #    Variable    Type    Len    Format       Informat    Label

               18    ALTPRC      Num       8    12.5         12.5        Price Alternate
               20    ALTPRCDT    Num       8    YYMMDDN8.    YYMMDD6.    Alternate Price Date
               14    ASK         Num       8    11.5         11.5        Ask
                9    ASKHI       Num       8    12.5         12.5        Ask or High Price
               13    BID         Num       8    11.5         11.5        Bid
                8    BIDLO       Num       8    12.5         12.5        Bid or Low Price
               16    CFACPR      Num       8                             Cumulative Factor to Adjust Prices
               17    CFACSHR     Num       8                             Cumulative Factor to Adjust Shares/Vol
                1    CUSIP       Char      8    8.           8.          CUSIP Header
                7    DATE        Num       8    YYMMDDN8.                Date of Observation
                5    HEXCD       Num       8    2.           2.          Exchange Code Header
                6    HSICCD      Num       8    8.           8.          Standard Industrial Classification Code
                4    ISSUNO      Num       8    8.           8.          Nasdaq Issue Number
                3    PERMCO      Num       8    8.           8.          PERMCO
                2    PERMNO      Num       8    8.           8.          PERMNO
               10    PRC         Num       8    12.5         12.5        Price or Bid/Ask Average
               12    RET         Num       8    11.6         11.6        Returns
               21    RETX        Num       8    11.6         11.6        Returns without Dividends
               15    SHROUT      Num       8                             Shares Outstanding
               19    SPREAD      Num       8    11.5         11.5        Spread Between Bid and Ask
               11    VOL         Num       8    10.          10.         Volume


                                           Alphabetic List of Indexes and Attributes

                                                                      # of
                                                                    Unique
                                                     #    Index     Values

                                                     1    CUSIP      32985
                                                     2    DATE        1117
                                                     3    HSICCD      1445
                                                     4    PERMCO     29481
                                                     5    PERMNO     32985


                                                        Sort Information

                                                   Sortedby       PERMNO DATE
                                                   Validated      YES
                                                   Character Set  ASCII
                                                   Sort Option    NODUPKEY&lt;/PRE&gt;&lt;P&gt;Here I attach the results from PROC CONTENTS too.&lt;/P&gt;</description>
      <pubDate>Wed, 11 Dec 2019 15:36:47 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Does-WHERE-Shuffle-Observations/m-p/611050#M178037</guid>
      <dc:creator>Junyong</dc:creator>
      <dc:date>2019-12-11T15:36:47Z</dc:date>
    </item>
    <item>
      <title>Re: Does WHERE Shuffle Observations?</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Does-WHERE-Shuffle-Observations/m-p/611052#M178038</link>
      <description>&lt;P&gt;It is impossible for me to tell from your photographs of your browser window of the dataset whether the data was sorted or not.&lt;/P&gt;
&lt;P&gt;Please run the following test instead.&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;* SUBSET the data ;
data msf1;
  set crsp.msf;
  where date&amp;lt;"1jan1930"d;
run;
* TEST if sorted ;
data _null_;
  set msf1 ;
  by permno date;
run;&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Wed, 11 Dec 2019 15:45:13 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Does-WHERE-Shuffle-Observations/m-p/611052#M178038</guid>
      <dc:creator>Tom</dc:creator>
      <dc:date>2019-12-11T15:45:13Z</dc:date>
    </item>
    <item>
      <title>Re: Does WHERE Shuffle Observations?</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Does-WHERE-Shuffle-Observations/m-p/611078#M178048</link>
      <description>The general rule of programming is if you didn't sort it you CANNOT assume it's going to be sorted. Especially if pulling from DB.</description>
      <pubDate>Wed, 11 Dec 2019 17:30:15 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Does-WHERE-Shuffle-Observations/m-p/611078#M178048</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2019-12-11T17:30:15Z</dc:date>
    </item>
    <item>
      <title>Re: Does WHERE Shuffle Observations?</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Does-WHERE-Shuffle-Observations/m-p/611466#M178206</link>
      <description>&lt;P&gt;The PROC CONTENTS output right above says &lt;EM&gt;crsp.msf&lt;/EM&gt; is sorted (Sortedby PERMNO DATE). It seems (1) though &lt;EM&gt;crsp.msf&lt;/EM&gt;&amp;nbsp;per se is sorted, &lt;EM&gt;crsp.msf&lt;/EM&gt; with WHERE is not, and (2) though &lt;EM&gt;crsp.msf&lt;/EM&gt; with WHERE is not sorted, &lt;EM&gt;crsp.msf&lt;/EM&gt; with both WHERE and BY is sorted. Thanks.&lt;/P&gt;</description>
      <pubDate>Thu, 12 Dec 2019 23:23:14 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Does-WHERE-Shuffle-Observations/m-p/611466#M178206</guid>
      <dc:creator>Junyong</dc:creator>
      <dc:date>2019-12-12T23:23:14Z</dc:date>
    </item>
    <item>
      <title>Re: Does WHERE Shuffle Observations?</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Does-WHERE-Shuffle-Observations/m-p/611468#M178208</link>
      <description>&lt;P&gt;If the data is sorted when read in and you use a BY statement SAS will not generate an error.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;If the data is sorted with a prior sort and sort entry in PROC CONTENTS then it will not generate an error.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;However, these are two different cases that may need to be considered separately.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 12 Dec 2019 23:24:54 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Does-WHERE-Shuffle-Observations/m-p/611468#M178208</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2019-12-12T23:24:54Z</dc:date>
    </item>
    <item>
      <title>Re: Does WHERE Shuffle Observations?</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Does-WHERE-Shuffle-Observations/m-p/611470#M178210</link>
      <description>&lt;P&gt;Please post the lines from the SAS log for running the two step test that I posted before.&lt;/P&gt;</description>
      <pubDate>Thu, 12 Dec 2019 23:26:43 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Does-WHERE-Shuffle-Observations/m-p/611470#M178210</guid>
      <dc:creator>Tom</dc:creator>
      <dc:date>2019-12-12T23:26:43Z</dc:date>
    </item>
    <item>
      <title>Re: Does WHERE Shuffle Observations?</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Does-WHERE-Shuffle-Observations/m-p/611472#M178212</link>
      <description>&lt;P&gt;The PROC CONTENTS output above displays Sortedby PERMNO DATE. It seems WHERE must be accompanied by BY to preserve the sorting status. Thanks.&lt;/P&gt;</description>
      <pubDate>Thu, 12 Dec 2019 23:30:14 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Does-WHERE-Shuffle-Observations/m-p/611472#M178212</guid>
      <dc:creator>Junyong</dc:creator>
      <dc:date>2019-12-12T23:30:14Z</dc:date>
    </item>
    <item>
      <title>Re: Does WHERE Shuffle Observations?</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Does-WHERE-Shuffle-Observations/m-p/611474#M178214</link>
      <description>&lt;P&gt;Where is applied before BY so that doesn't seem right either....&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;H3 id="n0s8fofim62szgn1gmbeu3u12isj" class="xisDoc-title"&gt;WHERE and BY in a DATA Step&lt;/H3&gt;
&lt;P class="xisDoc-paragraph"&gt;If a DATA step contains both a WHERE statement and a BY statement,&lt;STRONG&gt; the WHERE statement executes&amp;nbsp;&lt;EM class="xisDoc-userSuppliedValue"&gt;before&lt;/EM&gt;&amp;nbsp;BY groups are created.&lt;/STRONG&gt; Therefore, BY groups reflect groups of observations in the subset of observations that are selected by the WHERE statement, not the actual BY groups of observations in the original input data set.&lt;/P&gt;
&lt;P class="xisDoc-paragraph"&gt;For a complete discussion of BY-group processing, see&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;A class="ng-scope" tabindex="0" title="" href="https://documentation.sas.com/?docsetId=lrcon&amp;amp;docsetTarget=n138da4gme3zb7n1nifpfhqv7clq.htm&amp;amp;docsetVersion=9.4&amp;amp;locale=en" data-docset-id="lrcon" data-docset-version="9.4" data-original-href="n138da4gme3zb7n1nifpfhqv7clq.htm"&gt;BY-Group Processing in the DATA Step&lt;/A&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;in&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN class="xisDoc-xrefBookTitle"&gt;SAS Language Reference: Concepts&lt;/SPAN&gt;.&lt;/P&gt;
&lt;P class="xisDoc-paragraph"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P class="xisDoc-paragraph"&gt;&lt;STRIKE&gt;EDIT: I suspect the BY statement adds a sort flag to data set, but will check and confirm.&amp;nbsp;&lt;/STRIKE&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 12 Dec 2019 23:56:06 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Does-WHERE-Shuffle-Observations/m-p/611474#M178214</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2019-12-12T23:56:06Z</dc:date>
    </item>
    <item>
      <title>Re: Does WHERE Shuffle Observations?</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Does-WHERE-Shuffle-Observations/m-p/611477#M178215</link>
      <description>&lt;P&gt;So nope. I stand by my original answer, you cannot assume that a data set is sorted. The sort flag does not propogate across. There may be an implicit sort, but you're taking a risk there when SAS is not sorting the data. I will add this may behave differently if you're working on DBs because SQL by definition doesn't maintain row orders.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;BY does check for the implicit sort and won't error out if it's there so likely that's what you're seeing and if its good enough for you on this system you're good. If you change things you may have to re-think this, I would probably add a check to ensure the data is sorted as needed or at least if it errors out it will explain why.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;*implicitly sorted data set;
data have;
input ID Age;
cards;
1 5
2 8 
3 13
4 14
5 3
6 34
7 6
8 3
9 3
10 2
;;;;

*check by/where without sort;
data want1 (label="Implicit Sort, BY and Where");
set have;
by id;
where id in (1, 2, 3);
run;

*explicitly sort data set now;
proc sort data=have out=have2 (label="Sorted input data");
by id;
run;

data want2 (label="explicit Sort, Only Where");
set have2;
where id in (1, 2, 3);
run;

data want3 (label="explicit sort, By and Where");
set have2;
by id;
where id in (1, 2,3 );
run;


data want3  (label="Explicit sort, copy only with BY");
set have2;
by id;
run;

data results;
set sashelp.vtable;
where libname='WORK' and (%Upcase(memname) like '%WANT%' or  %Upcase(memname) like '%HAVE%') ;
keep libname memname memlabel sort:;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="delete_results.JPG" style="width: 600px;"&gt;&lt;img src="https://communities.sas.com/t5/image/serverpage/image-id/34739i58F5AA6C1F7C0442/image-size/large?v=v2&amp;amp;px=999" role="button" title="delete_results.JPG" alt="delete_results.JPG" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 12 Dec 2019 23:57:23 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Does-WHERE-Shuffle-Observations/m-p/611477#M178215</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2019-12-12T23:57:23Z</dc:date>
    </item>
    <item>
      <title>Re: Does WHERE Shuffle Observations?</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Does-WHERE-Shuffle-Observations/m-p/611494#M178218</link>
      <description>&lt;P&gt;The code generates an error.&lt;/P&gt;&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;rsubmit;
* SUBSET the data ;
data msf1;
  set crsp.msf;
  where date&amp;lt;"1jan1930"d;
run;
* TEST if sorted ;
data _null_;
  set msf1 ;
  by permno date;
run;
endrsubmit;&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;And the output here.&lt;/P&gt;&lt;PRE&gt;1    rsubmit;
NOTE: Remote submit to WRDS commencing.
1    * SUBSET the data ;
2    data msf1;
3      set crsp.msf;
4      where date&amp;lt;"1jan1930"d;
5    run;

NOTE: There were 29968 observations read from the data set CRSP.MSF.
      WHERE date&amp;lt;'01JAN1930'D;
NOTE: The data set WORK.MSF1 has 29968 observations and 21 variables.
NOTE: DATA statement used (Total process time):
      real time           0.69 seconds
      cpu time            0.36 seconds


6    * TEST if sorted ;
7    data _null_;
8      set msf1 ;
9      by permno date;
10   run;

ERROR: BY variables are not properly sorted on data set WORK.MSF1.
CUSIP=69499890 PERMNO=75471 PERMCO=25928 ISSUNO=0 HEXCD=1 HSICCD=3490
DATE=19251231 BIDLO=. ASKHI=. PRC=-32.00000 VOL=. RET=C BID=31.00000
ASK=33.00000 SHROUT=70 CFACPR=4.810256 CFACSHR=4 ALTPRC=-32.00000
SPREAD=2.00000
ALTPRCDT=19251231 RETX=C FIRST.PERMNO=1 LAST.PERMNO=1 FIRST.DATE=1
LAST.DATE=1
_ERROR_=1 _N_=520
NOTE: The SAS System stopped processing this step because of errors.
NOTE: There were 521 observations read from the data set WORK.MSF1.
NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds


NOTE: Remote submit to WRDS complete.&lt;/PRE&gt;&lt;P&gt;Adding BY eliminates the error as follows.&lt;/P&gt;&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;rsubmit;
* SUBSET the data ;
data msf1;
  set crsp.msf;
  where date&amp;lt;"1jan1930"d;
  by permno date;
run;
* TEST if sorted ;
data _null_;
  set msf1 ;
  by permno date;
run;
endrsubmit;&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;And the output here too.&lt;/P&gt;&lt;PRE&gt;2    rsubmit;
NOTE: Remote submit to WRDS commencing.
11   * SUBSET the data ;
12   data msf1;
13     set crsp.msf;
14     where date&amp;lt;"1jan1930"d;
15     by permno date;
16   run;

NOTE: There were 29968 observations read from the data set CRSP.MSF.
      WHERE date&amp;lt;'01JAN1930'D;
NOTE: The data set WORK.MSF1 has 29968 observations and 21 variables.
NOTE: DATA statement used (Total process time):
      real time           0.79 seconds
      cpu time            0.32 seconds


17   * TEST if sorted ;
18   data _null_;
19     set msf1 ;
20     by permno date;
21   run;

NOTE: There were 29968 observations read from the data set WORK.MSF1.
NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds


NOTE: Remote submit to WRDS complete.&lt;/PRE&gt;&lt;P&gt;(1) &lt;EM&gt;crsp.msf&lt;/EM&gt; is sorted. (2) &lt;EM&gt;crsp.msf&lt;/EM&gt;+WHERE is not sorted. (3) &lt;EM&gt;crsp.msf&lt;/EM&gt;+WHERE+BY is sorted.&lt;/P&gt;</description>
      <pubDate>Fri, 13 Dec 2019 00:48:14 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Does-WHERE-Shuffle-Observations/m-p/611494#M178218</guid>
      <dc:creator>Junyong</dc:creator>
      <dc:date>2019-12-13T00:48:14Z</dc:date>
    </item>
    <item>
      <title>Re: Does WHERE Shuffle Observations?</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Does-WHERE-Shuffle-Observations/m-p/611495#M178219</link>
      <description>&lt;P&gt;So everything other than the explicit sorting would be risky then. For example, just &lt;EM&gt;have2&lt;/EM&gt; is safe. I use &lt;EM&gt;want1&lt;/EM&gt; or &lt;EM&gt;want3&lt;/EM&gt; as rely on the implicit sorting, but seem to be risky. My example is closer to &lt;EM&gt;want2&lt;/EM&gt; (&lt;EM&gt;have2&lt;/EM&gt; is explicitly sorted, but &lt;EM&gt;want2&lt;/EM&gt; is not necessarily. Likewise, &lt;EM&gt;crsp.msf&lt;/EM&gt; is explicitly sorted, but will not be as soon as used with WHERE).&lt;/P&gt;</description>
      <pubDate>Fri, 13 Dec 2019 01:16:11 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Does-WHERE-Shuffle-Observations/m-p/611495#M178219</guid>
      <dc:creator>Junyong</dc:creator>
      <dc:date>2019-12-13T01:16:11Z</dc:date>
    </item>
    <item>
      <title>Re: Does WHERE Shuffle Observations?</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Does-WHERE-Shuffle-Observations/m-p/611498#M178222</link>
      <description>&lt;P&gt;Do you know how that libname, CRSP,&amp;nbsp; is defined? Is it using the BASE (default) SAS engine? SAS/Share? Some other engine?&lt;/P&gt;
&lt;P&gt;Also what version of SAS is that remote session running?&lt;/P&gt;</description>
      <pubDate>Fri, 13 Dec 2019 02:05:55 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Does-WHERE-Shuffle-Observations/m-p/611498#M178222</guid>
      <dc:creator>Tom</dc:creator>
      <dc:date>2019-12-13T02:05:55Z</dc:date>
    </item>
  </channel>
</rss>

