<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: proc sort/proc sql failing in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/proc-sort-proc-sql-failing/m-p/404042#M98215</link>
    <description>&lt;P&gt;I take it your data are not already sorted (even with duplicates).&amp;nbsp; Otherwise a simple DATA step could de-dupe.&amp;nbsp; But does your data have any subset of variables (say up to 5 out of 30 vars) that are the most discriminating?&amp;nbsp; Even if you have only an ID var,&amp;nbsp;for which (say) only 5% of the ID's have duplicates.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;In another context, I'm working on a solution using hash objects that can severely reduce memory requirements while removing duplicates.&amp;nbsp; I'd suggest it if your situations seems analogous.&lt;/P&gt;</description>
    <pubDate>Fri, 13 Oct 2017 17:24:33 GMT</pubDate>
    <dc:creator>mkeintz</dc:creator>
    <dc:date>2017-10-13T17:24:33Z</dc:date>
    <item>
      <title>proc sort/proc sql failing</title>
      <link>https://communities.sas.com/t5/SAS-Programming/proc-sort-proc-sql-failing/m-p/403971#M98185</link>
      <description>&lt;P&gt;Sorting using proc sort or proc sql using distinct on 18 variables and 55 million records is failing on a windows PC. How to sort the dataset?&lt;/P&gt;&lt;P&gt;Version:9.04.01M3P062415&lt;BR /&gt;Operating System:&amp;nbsp;&amp;nbsp; WX64_WKS.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 13 Oct 2017 15:07:46 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/proc-sort-proc-sql-failing/m-p/403971#M98185</guid>
      <dc:creator>SASPhile</dc:creator>
      <dc:date>2017-10-13T15:07:46Z</dc:date>
    </item>
    <item>
      <title>Re: proc sort/proc sql failing</title>
      <link>https://communities.sas.com/t5/SAS-Programming/proc-sort-proc-sql-failing/m-p/403972#M98186</link>
      <description>&lt;P&gt;"Failing"?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Please be specific, give us the details.&lt;/P&gt;</description>
      <pubDate>Fri, 13 Oct 2017 15:10:57 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/proc-sort-proc-sql-failing/m-p/403972#M98186</guid>
      <dc:creator>PaigeMiller</dc:creator>
      <dc:date>2017-10-13T15:10:57Z</dc:date>
    </item>
    <item>
      <title>Re: proc sort/proc sql failing</title>
      <link>https://communities.sas.com/t5/SAS-Programming/proc-sort-proc-sql-failing/m-p/403973#M98187</link>
      <description>&lt;P&gt;Best if you can share your code and the log.&amp;nbsp; And using "distinct"? Does that mean that you expect to have &amp;lt; 55 million records at the end?&amp;nbsp; What's your end goal -- summarization? Subset of distinct records?&amp;nbsp; Or just a no-duplicate-key situation?&amp;nbsp; There are lots of ways to get there -- we just need to know what you're after.&lt;/P&gt;</description>
      <pubDate>Fri, 13 Oct 2017 15:13:49 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/proc-sort-proc-sql-failing/m-p/403973#M98187</guid>
      <dc:creator>ChrisHemedinger</dc:creator>
      <dc:date>2017-10-13T15:13:49Z</dc:date>
    </item>
    <item>
      <title>Re: proc sort/proc sql failing</title>
      <link>https://communities.sas.com/t5/SAS-Programming/proc-sort-proc-sql-failing/m-p/403974#M98188</link>
      <description>&lt;P&gt;55 million records * 18 variables is quite a large dataset, are you running out of resources/temporary area.&lt;/P&gt;
&lt;P&gt;Perhaps consider your problem and use techniques associated with big data.&amp;nbsp; SQL can be quite resource hungry writing to temporary files and doing other things behind the scenes.&amp;nbsp; Proc sort should be better in this respect, but sorting 55mil records could be an issue.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 13 Oct 2017 15:16:00 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/proc-sort-proc-sql-failing/m-p/403974#M98188</guid>
      <dc:creator>RW9</dc:creator>
      <dc:date>2017-10-13T15:16:00Z</dc:date>
    </item>
    <item>
      <title>Re: proc sort/proc sql failing</title>
      <link>https://communities.sas.com/t5/SAS-Programming/proc-sort-proc-sql-failing/m-p/403976#M98189</link>
      <description>&lt;P&gt;Hi Paige Miller,&lt;BR /&gt;ERROR: An I/O error has occurred on file WORK.'SASTMP-000000030'n.UTILITY. ERROR: File WORK.'SASTMP-000000030'n.UTILITY is damaged. I/O processing did not complete.&lt;/P&gt;</description>
      <pubDate>Fri, 13 Oct 2017 15:22:53 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/proc-sort-proc-sql-failing/m-p/403976#M98189</guid>
      <dc:creator>SASPhile</dc:creator>
      <dc:date>2017-10-13T15:22:53Z</dc:date>
    </item>
    <item>
      <title>Re: proc sort/proc sql failing</title>
      <link>https://communities.sas.com/t5/SAS-Programming/proc-sort-proc-sql-failing/m-p/403978#M98190</link>
      <description>&lt;P&gt;to get to no duplicate situation&lt;/P&gt;</description>
      <pubDate>Fri, 13 Oct 2017 15:23:45 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/proc-sort-proc-sql-failing/m-p/403978#M98190</guid>
      <dc:creator>SASPhile</dc:creator>
      <dc:date>2017-10-13T15:23:45Z</dc:date>
    </item>
    <item>
      <title>Re: proc sort/proc sql failing</title>
      <link>https://communities.sas.com/t5/SAS-Programming/proc-sort-proc-sql-failing/m-p/404000#M98201</link>
      <description>&lt;P&gt;Show the code that you ran for Proc Sort.&lt;/P&gt;</description>
      <pubDate>Fri, 13 Oct 2017 16:01:50 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/proc-sort-proc-sql-failing/m-p/404000#M98201</guid>
      <dc:creator>ballardw</dc:creator>
      <dc:date>2017-10-13T16:01:50Z</dc:date>
    </item>
    <item>
      <title>Re: proc sort/proc sql failing</title>
      <link>https://communities.sas.com/t5/SAS-Programming/proc-sort-proc-sql-failing/m-p/404009#M98205</link>
      <description>&lt;P&gt;proc sort data=mylib.dsn tagsort nodupkey out=target.dsn. dupout=target.dsn_dups;&lt;BR /&gt;&amp;nbsp; by&amp;nbsp; _all_;&lt;BR /&gt;run;&lt;/P&gt;</description>
      <pubDate>Fri, 13 Oct 2017 16:21:58 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/proc-sort-proc-sql-failing/m-p/404009#M98205</guid>
      <dc:creator>SASPhile</dc:creator>
      <dc:date>2017-10-13T16:21:58Z</dc:date>
    </item>
    <item>
      <title>Re: proc sort/proc sql failing</title>
      <link>https://communities.sas.com/t5/SAS-Programming/proc-sort-proc-sql-failing/m-p/404042#M98215</link>
      <description>&lt;P&gt;I take it your data are not already sorted (even with duplicates).&amp;nbsp; Otherwise a simple DATA step could de-dupe.&amp;nbsp; But does your data have any subset of variables (say up to 5 out of 30 vars) that are the most discriminating?&amp;nbsp; Even if you have only an ID var,&amp;nbsp;for which (say) only 5% of the ID's have duplicates.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;In another context, I'm working on a solution using hash objects that can severely reduce memory requirements while removing duplicates.&amp;nbsp; I'd suggest it if your situations seems analogous.&lt;/P&gt;</description>
      <pubDate>Fri, 13 Oct 2017 17:24:33 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/proc-sort-proc-sql-failing/m-p/404042#M98215</guid>
      <dc:creator>mkeintz</dc:creator>
      <dc:date>2017-10-13T17:24:33Z</dc:date>
    </item>
    <item>
      <title>Re: proc sort/proc sql failing</title>
      <link>https://communities.sas.com/t5/SAS-Programming/proc-sort-proc-sql-failing/m-p/404047#M98217</link>
      <description>&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/16600"&gt;@SASPhile&lt;/a&gt;&amp;nbsp; I don't think TAGSORT will be of much benefit if the sort is &lt;EM&gt;&lt;STRONG&gt;BY _ALL_&lt;/STRONG&gt;&lt;/EM&gt;.&amp;nbsp;&amp;nbsp;&amp;nbsp; I suspect it save disk space&amp;nbsp;only if the by vars are a small subset of all vars.&lt;/P&gt;</description>
      <pubDate>Fri, 13 Oct 2017 17:37:13 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/proc-sort-proc-sql-failing/m-p/404047#M98217</guid>
      <dc:creator>mkeintz</dc:creator>
      <dc:date>2017-10-13T17:37:13Z</dc:date>
    </item>
    <item>
      <title>Re: proc sort/proc sql failing</title>
      <link>https://communities.sas.com/t5/SAS-Programming/proc-sort-proc-sql-failing/m-p/404126#M98237</link>
      <description>&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/16600"&gt;@SASPhile&lt;/a&gt; wrote:&lt;BR /&gt;
&lt;P&gt;proc sort data=mylib.dsn tagsort nodupkey out=target.dsn. dupout=target.dsn_dups;&lt;BR /&gt;&amp;nbsp; by&amp;nbsp; _all_;&lt;BR /&gt;run;&lt;/P&gt;
&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;From the documentation:&lt;/P&gt;
&lt;P&gt;When the total length of BY variables is small compared with the record length, TAGSORT reduces temporary disk usage considerably. However, processing time might be much higher.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;And since you are not using any subset of the variables for your by variables you are adding a complexity that likely isn't improving performance at all.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 13 Oct 2017 20:38:13 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/proc-sort-proc-sql-failing/m-p/404126#M98237</guid>
      <dc:creator>ballardw</dc:creator>
      <dc:date>2017-10-13T20:38:13Z</dc:date>
    </item>
    <item>
      <title>Re: proc sort/proc sql failing</title>
      <link>https://communities.sas.com/t5/SAS-Programming/proc-sort-proc-sql-failing/m-p/404528#M98334</link>
      <description>&lt;P&gt;Deduplication of rows in a very large SAS data set turns out to be easy to do with a hash object. Difficulties arise when deciding on how to deduplicate.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The DISTINCT operator in SAS SQL eliminates rows that have exactly the same column variable values. SORT NODUPKEY keeps on the first row among multiple rows with the same key value(s). The hash object&amp;nbsp; allows one to define a key of one or more column variables, look it up on a hash index as a Data step reads each row of a SAS Data step, and decide how to write or suppress key duplicates. As a safeguard, it often makes sense to write the rows with the first instance of a key value to a main data set and rows with subsequent values of that same key value to a data set of excluded duplicates.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;How do you prefer to handle the duplicates? Note that if one is eliminating key duplicates, the duplicates may contain data not found in the row being kept.&lt;/P&gt;&lt;P&gt;S&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 16 Oct 2017 17:41:32 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/proc-sort-proc-sql-failing/m-p/404528#M98334</guid>
      <dc:creator>_s_</dc:creator>
      <dc:date>2017-10-16T17:41:32Z</dc:date>
    </item>
  </channel>
</rss>

