<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Proc sort or Proc sql in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Proc-sort-or-Proc-sql/m-p/13834#M1710</link>
    <description>Hi Tommy&lt;BR /&gt;
&lt;BR /&gt;
Following Scott's suggestion to reduce the variables before sorting an approach like below could work.&lt;BR /&gt;
The assumption is that after reducing the variables the sort can be done fully in memory - you will have to give it a try.&lt;BR /&gt;
&lt;BR /&gt;
data massive(drop=i);&lt;BR /&gt;
  array ManyVars {10} (10*1);&lt;BR /&gt;
  do i=1 to 1000;&lt;BR /&gt;
    sale=ceil(ranuni(1)*700);&lt;BR /&gt;
    output;&lt;BR /&gt;
  end;&lt;BR /&gt;
run;&lt;BR /&gt;
&lt;BR /&gt;
data V_PrepareSort /view=V_PrepareSort;&lt;BR /&gt;
  set massive(keep=sale);&lt;BR /&gt;
  obsID=_n_;&lt;BR /&gt;
run;&lt;BR /&gt;
&lt;BR /&gt;
proc sql;&lt;BR /&gt;
  create view V_SortedMassiveSale as&lt;BR /&gt;
    select sale, obsID&lt;BR /&gt;
    from V_PrepareSort&lt;BR /&gt;
    order by sale&lt;BR /&gt;
  ;&lt;BR /&gt;
quit;&lt;BR /&gt;
&lt;BR /&gt;
options sortsize=max;&lt;BR /&gt;
data OneObs(compress=yes) MultiObs(compress=yes);&lt;BR /&gt;
  set V_SortedMassiveSale;&lt;BR /&gt;
  by sale;&lt;BR /&gt;
  &lt;BR /&gt;
  set massive point=obsID;&lt;BR /&gt;
&lt;BR /&gt;
  if first.sale and last.sale then output OneObs;&lt;BR /&gt;
  else output MultiObs;&lt;BR /&gt;
run;&lt;BR /&gt;
&lt;BR /&gt;
&lt;BR /&gt;
HTH&lt;BR /&gt;
Patrick</description>
    <pubDate>Fri, 08 Oct 2010 10:19:56 GMT</pubDate>
    <dc:creator>Patrick</dc:creator>
    <dc:date>2010-10-08T10:19:56Z</dc:date>
    <item>
      <title>Proc sort or Proc sql</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Proc-sort-or-Proc-sql/m-p/13829#M1705</link>
      <description>Hi&lt;BR /&gt;
&lt;BR /&gt;
I have a situation where  I have to sort a 25 gb dataset and do the below .&lt;BR /&gt;
&lt;BR /&gt;
proc sort data=sale;by sale;run;&lt;BR /&gt;
&lt;BR /&gt;
data temp1 temp2;&lt;BR /&gt;
 set sale;&lt;BR /&gt;
 by sale;&lt;BR /&gt;
 if first.sale=last.sale then output temp2; else output temp1;&lt;BR /&gt;
 run;&lt;BR /&gt;
&lt;BR /&gt;
Since this is taking almost 2hrs could somebody suggest an alternative using proc sql if you think that might take even less time.</description>
      <pubDate>Fri, 08 Oct 2010 05:32:28 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Proc-sort-or-Proc-sql/m-p/13829#M1705</guid>
      <dc:creator>tommy81</dc:creator>
      <dc:date>2010-10-08T05:32:28Z</dc:date>
    </item>
    <item>
      <title>Re: Proc sort or Proc sql</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Proc-sort-or-Proc-sql/m-p/13830#M1706</link>
      <description>As there are operating systems that allow for quite some SORT tweaking please post what operating system you are using and also what version of SAS you are using. And also how many CPUs you have available and whether option THREADS is active or not.&lt;BR /&gt;
&lt;BR /&gt;
Please post the result of&lt;BR /&gt;
[pre]&lt;BR /&gt;
%put OS &amp;amp;sysscp &amp;amp;sysscpl ;&lt;BR /&gt;
%put SAS &amp;amp;sysver ;&lt;BR /&gt;
%put CPUs &amp;amp;sysncpu ;&lt;BR /&gt;
%put THREADS: &amp;amp;sysfunc(getoption(threads)) ;&lt;BR /&gt;
proc options group=sort; run;&lt;BR /&gt;
[/pre]</description>
      <pubDate>Fri, 08 Oct 2010 06:42:55 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Proc-sort-or-Proc-sql/m-p/13830#M1706</guid>
      <dc:creator>Robert_Bardos</dc:creator>
      <dc:date>2010-10-08T06:42:55Z</dc:date>
    </item>
    <item>
      <title>Re: Proc sort or Proc sql</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Proc-sort-or-Proc-sql/m-p/13831#M1707</link>
      <description>15         %put OS &amp;amp;sysscp &amp;amp;sysscpl&lt;BR /&gt;
16         ;%put SAS &amp;amp;sysver ;&lt;BR /&gt;
OS SUN 64 SunOS&lt;BR /&gt;
SAS 9.2&lt;BR /&gt;
17         %put CPUs &amp;amp;sysncpu ;&lt;BR /&gt;
CPUs 4&lt;BR /&gt;
&lt;BR /&gt;
&lt;BR /&gt;
proc options group=sort;&lt;BR /&gt;
20          run;&lt;BR /&gt;
    SAS (r) Proprietary Software Release 9.2  TS2M3  SORTDUP=PHYSICAL  SORT applies NODUP option to physical or logical records?&lt;BR /&gt;
 SORTEQUALS        Maintain order for the input data set in the output data set, when processing identical BY-variable values with &lt;BR /&gt;
                   Proc Sort&lt;BR /&gt;
 SORTSEQ=          Collating sequence for sorting&lt;BR /&gt;
 SORTSIZE=83886080 Size parameter for sort&lt;BR /&gt;
 NOSORTVALIDATE    Do not use automatic sort order validation to determine assertion&lt;BR /&gt;
 SORTANOM=         Host sort option&lt;BR /&gt;
 SORTCUT=0         Specifies the number of observations above which the host sort program is used instead of the SAS sort program&lt;BR /&gt;
 SORTCUTP=0        Specifies the number of bytes above which the host sort program should be used instead of the SAS sort program&lt;BR /&gt;
 SORTDEV=          Specifies the pathname used for temporary host sort files&lt;BR /&gt;
 SORTNAME=         Specifies the name of the host sort utility&lt;BR /&gt;
 SORTPARM=         Specifies the host sort parameters&lt;BR /&gt;
 SORTPGM=BEST      Specifies the name of the sort utility</description>
      <pubDate>Fri, 08 Oct 2010 08:22:45 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Proc-sort-or-Proc-sql/m-p/13831#M1707</guid>
      <dc:creator>tommy81</dc:creator>
      <dc:date>2010-10-08T08:22:45Z</dc:date>
    </item>
    <item>
      <title>Re: Proc sort or Proc sql</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Proc-sort-or-Proc-sql/m-p/13832#M1708</link>
      <description>Thanks Tommy,&lt;BR /&gt;
&lt;BR /&gt;
so you have a SUN 64 system, 4 CPUs, running SAS 9.2.&lt;BR /&gt;
&lt;BR /&gt;
Coming from the mainframe side I can only say that the SORTSIZE value seems rather low to me. Do not these SUN systems have an overwhelming amount of memory available? If 83886080 actually means somewhat around 80MB then I think that value is extremely low causing SORT to do some heavy disk I/O. But then, I'm a mainframer, so we better wait for the operating system specific experts to wake up and get in front of their keyboards.</description>
      <pubDate>Fri, 08 Oct 2010 09:10:29 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Proc-sort-or-Proc-sql/m-p/13832#M1708</guid>
      <dc:creator>Robert_Bardos</dc:creator>
      <dc:date>2010-10-08T09:10:29Z</dc:date>
    </item>
    <item>
      <title>Re: Proc sort or Proc sql</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Proc-sort-or-Proc-sql/m-p/13833#M1709</link>
      <description>I'm not so sure PROC SQL is the answer, however you could look at building an index or using a SAS VIEW -- or at least subsetting your input file to only choose SAS variables that your program will require.&lt;BR /&gt;
&lt;BR /&gt;
Scott Barry&lt;BR /&gt;
SBBWorks, Inc.&lt;BR /&gt;
&lt;BR /&gt;
Suggested Google advanced search arguments, this topic / post:&lt;BR /&gt;
&lt;BR /&gt;
solaris sort performance site:sas.com&lt;BR /&gt;
&lt;BR /&gt;
sort performance index site:sas.com</description>
      <pubDate>Fri, 08 Oct 2010 09:21:22 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Proc-sort-or-Proc-sql/m-p/13833#M1709</guid>
      <dc:creator>sbb</dc:creator>
      <dc:date>2010-10-08T09:21:22Z</dc:date>
    </item>
    <item>
      <title>Re: Proc sort or Proc sql</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Proc-sort-or-Proc-sql/m-p/13834#M1710</link>
      <description>Hi Tommy&lt;BR /&gt;
&lt;BR /&gt;
Following Scott's suggestion to reduce the variables before sorting an approach like below could work.&lt;BR /&gt;
The assumption is that after reducing the variables the sort can be done fully in memory - you will have to give it a try.&lt;BR /&gt;
&lt;BR /&gt;
data massive(drop=i);&lt;BR /&gt;
  array ManyVars {10} (10*1);&lt;BR /&gt;
  do i=1 to 1000;&lt;BR /&gt;
    sale=ceil(ranuni(1)*700);&lt;BR /&gt;
    output;&lt;BR /&gt;
  end;&lt;BR /&gt;
run;&lt;BR /&gt;
&lt;BR /&gt;
data V_PrepareSort /view=V_PrepareSort;&lt;BR /&gt;
  set massive(keep=sale);&lt;BR /&gt;
  obsID=_n_;&lt;BR /&gt;
run;&lt;BR /&gt;
&lt;BR /&gt;
proc sql;&lt;BR /&gt;
  create view V_SortedMassiveSale as&lt;BR /&gt;
    select sale, obsID&lt;BR /&gt;
    from V_PrepareSort&lt;BR /&gt;
    order by sale&lt;BR /&gt;
  ;&lt;BR /&gt;
quit;&lt;BR /&gt;
&lt;BR /&gt;
options sortsize=max;&lt;BR /&gt;
data OneObs(compress=yes) MultiObs(compress=yes);&lt;BR /&gt;
  set V_SortedMassiveSale;&lt;BR /&gt;
  by sale;&lt;BR /&gt;
  &lt;BR /&gt;
  set massive point=obsID;&lt;BR /&gt;
&lt;BR /&gt;
  if first.sale and last.sale then output OneObs;&lt;BR /&gt;
  else output MultiObs;&lt;BR /&gt;
run;&lt;BR /&gt;
&lt;BR /&gt;
&lt;BR /&gt;
HTH&lt;BR /&gt;
Patrick</description>
      <pubDate>Fri, 08 Oct 2010 10:19:56 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Proc-sort-or-Proc-sql/m-p/13834#M1710</guid>
      <dc:creator>Patrick</dc:creator>
      <dc:date>2010-10-08T10:19:56Z</dc:date>
    </item>
    <item>
      <title>Re: Proc sort or Proc sql</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Proc-sort-or-Proc-sql/m-p/13835#M1711</link>
      <description>tommy81&lt;BR /&gt;
 &lt;BR /&gt;
since it is awkward to sort, I would try not to need the sort.&lt;BR /&gt;
What are you doing with the sorted data?&lt;BR /&gt;
Some things might be accomplished without the sort ..... &lt;BR /&gt;
 &lt;BR /&gt;
hth&lt;BR /&gt;
peterC</description>
      <pubDate>Fri, 08 Oct 2010 11:38:56 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Proc-sort-or-Proc-sql/m-p/13835#M1711</guid>
      <dc:creator>Peter_C</dc:creator>
      <dc:date>2010-10-08T11:38:56Z</dc:date>
    </item>
    <item>
      <title>Re: Proc sort or Proc sql</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Proc-sort-or-Proc-sql/m-p/13836#M1712</link>
      <description>Peter&lt;BR /&gt;
&lt;BR /&gt;
Not much .&lt;BR /&gt;
&lt;BR /&gt;
Just check if the there are more than one repeating value for sale , and if so output all those records to another dataset.&lt;BR /&gt;
&lt;BR /&gt;
Since the input dataset is not sorted , we could not go ahead with the first.sale=last.sale</description>
      <pubDate>Fri, 08 Oct 2010 12:50:59 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Proc-sort-or-Proc-sql/m-p/13836#M1712</guid>
      <dc:creator>tommy81</dc:creator>
      <dc:date>2010-10-08T12:50:59Z</dc:date>
    </item>
    <item>
      <title>Re: Proc sort or Proc sql</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Proc-sort-or-Proc-sql/m-p/13837#M1713</link>
      <description>Tommy&lt;BR /&gt;
If it would be sufficient to write only the 2nd to n-th record for repeated sales to a new data set then an approach using a hash table would work (creating kind of a black list on-the-fly).</description>
      <pubDate>Sat, 09 Oct 2010 00:57:14 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Proc-sort-or-Proc-sql/m-p/13837#M1713</guid>
      <dc:creator>Patrick</dc:creator>
      <dc:date>2010-10-09T00:57:14Z</dc:date>
    </item>
    <item>
      <title>Re: Proc sort or Proc sql</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Proc-sort-or-Proc-sql/m-p/13838#M1714</link>
      <description>PROC SORT option DUPOUT=  can be used, if the additional duplicates are to be split out.&lt;BR /&gt;
&lt;BR /&gt;
Scott Barry&lt;BR /&gt;
SBBWorks, Inc.&lt;BR /&gt;
&lt;BR /&gt;
Suggested Google advanced search argument, this topic / post:&lt;BR /&gt;
&lt;BR /&gt;
proc sort dupout site:sas.com</description>
      <pubDate>Sat, 09 Oct 2010 05:25:42 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Proc-sort-or-Proc-sql/m-p/13838#M1714</guid>
      <dc:creator>sbb</dc:creator>
      <dc:date>2010-10-09T05:25:42Z</dc:date>
    </item>
    <item>
      <title>Re: Proc sort or Proc sql</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Proc-sort-or-Proc-sql/m-p/13839#M1715</link>
      <description>Hi SBB,&lt;BR /&gt;
&lt;BR /&gt;
Exactly waht was needed..Simple but brilliant&lt;BR /&gt;
&lt;BR /&gt;
 Thank you so much.</description>
      <pubDate>Sat, 09 Oct 2010 17:55:53 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Proc-sort-or-Proc-sql/m-p/13839#M1715</guid>
      <dc:creator>tommy81</dc:creator>
      <dc:date>2010-10-09T17:55:53Z</dc:date>
    </item>
    <item>
      <title>Re: Proc sort or Proc sql</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Proc-sort-or-Proc-sql/m-p/13840#M1716</link>
      <description>Also, one additional feature/benefit is where you can use two back-to-back PROC SORT executions, and only the second would have the DUPOUT=, possibly using the NODUPKEY option additionally to control SAS behavior -- but also investigate the EQUALS option which is sometimes needed in the last SORT execution.  With EQUALS you can influence how your sort package (or SAS internal sort) sequences or leaves alone the incoming observation order, again, from a prior SORT.&lt;BR /&gt;
&lt;BR /&gt;
For example, say you want to eliminate all duplicates and send them to another file (using DUPOUT= with NODUPKEY specified), however you also may want to have a previous SORT with a more granular BY variable list, one that ensure some additional sort-order ahead of the DUPOUT= / NODUPKEY sort execution.&lt;BR /&gt;
&lt;BR /&gt;
And, also to remember (discussed in these forums before), it's up to the programmer to ensure that the PROC SORT BY variable list is granular enough to ensure that "duplicate observations" (achieved by either NODUPS or NODUPKEY) end up being adjacent, otherwise the sort-operation will not necessarily remove the duplicates -- they must be adjacent to achieve duplicate-deletion.&lt;BR /&gt;
&lt;BR /&gt;
Happy Sorting People!&lt;BR /&gt;
&lt;BR /&gt;
Scott Barry&lt;BR /&gt;
SBBWorks, Inc.&lt;BR /&gt;
&lt;BR /&gt;
&lt;BR /&gt;
Suggested Google advanced search arguments, this topic / post discussion:&lt;BR /&gt;
&lt;BR /&gt;
+"proc sort" +equals +sortequals behavior site:sas.com&lt;BR /&gt;
&lt;BR /&gt;
+"proc sort" by variable list remove duplicates site:sas.com</description>
      <pubDate>Sat, 09 Oct 2010 19:17:15 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Proc-sort-or-Proc-sql/m-p/13840#M1716</guid>
      <dc:creator>sbb</dc:creator>
      <dc:date>2010-10-09T19:17:15Z</dc:date>
    </item>
  </channel>
</rss>

