<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Proc SQL vs Proc Sort/Set statement in SAS Procedures</title>
    <link>https://communities.sas.com/t5/SAS-Procedures/Proc-SQL-vs-Proc-Sort-Set-statement/m-p/42063#M10907</link>
    <description>since you have only 16M rows and 2 columns to de-duplicate, I could recommend a hash able of keys with the row-numbers of the preferred row(for re-loading the data), but you might get the logical equivalent with the TAGSORT option of proc sort - when sort work areas should be needed only for keys (and that tag).</description>
    <pubDate>Wed, 01 Dec 2010 16:17:29 GMT</pubDate>
    <dc:creator>Peter_C</dc:creator>
    <dc:date>2010-12-01T16:17:29Z</dc:date>
    <item>
      <title>Proc SQL vs Proc Sort/Set statement</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/Proc-SQL-vs-Proc-Sort-Set-statement/m-p/42062#M10906</link>
      <description>I have a dataset with 16 million records and 64 variables, 2 of which I am looking to use to subset the data. Of the two, call them x and y, x has duplicates and I am looking to choose the one record of each unique x based on which one has the highest value of y. I know I can do this with either proc sql, using a "group by" approach or to sort first and then use a set statement with first.y etc approach. My concern here is which approach is generally considered more efficient ? I used to run away from all sorts until I realized that sometimes, a Proc SQL approach could be equally time-consuming. Any insights would be greatly appreciated.</description>
      <pubDate>Wed, 01 Dec 2010 12:40:23 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/Proc-SQL-vs-Proc-Sort-Set-statement/m-p/42062#M10906</guid>
      <dc:creator>Elkridge_SAS</dc:creator>
      <dc:date>2010-12-01T12:40:23Z</dc:date>
    </item>
    <item>
      <title>Re: Proc SQL vs Proc Sort/Set statement</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/Proc-SQL-vs-Proc-Sort-Set-statement/m-p/42063#M10907</link>
      <description>since you have only 16M rows and 2 columns to de-duplicate, I could recommend a hash able of keys with the row-numbers of the preferred row(for re-loading the data), but you might get the logical equivalent with the TAGSORT option of proc sort - when sort work areas should be needed only for keys (and that tag).</description>
      <pubDate>Wed, 01 Dec 2010 16:17:29 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/Proc-SQL-vs-Proc-Sort-Set-statement/m-p/42063#M10907</guid>
      <dc:creator>Peter_C</dc:creator>
      <dc:date>2010-12-01T16:17:29Z</dc:date>
    </item>
  </channel>
</rss>

