<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Eliminating duplicate dates with respect to ID# using Proc SQL in SAS Procedures</title>
    <link>https://communities.sas.com/t5/SAS-Procedures/Eliminating-duplicate-dates-with-respect-to-ID-using-Proc-SQL/m-p/85188#M24355</link>
    <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Ok, I figured out a little bit more about what's going on.&amp;nbsp; SRVC_BGNCount is counting the total number of unique service begin dates (SRVC_BGN_DT) in the file.&amp;nbsp; In other words, it's eliminating duplicates across the entire file, not just w/ respect to the ID variable.&amp;nbsp; The value is 365, which is greater than 3, so every record is included in the output file. Sorry about the confusion from the syntax I pasted... the forum software kept reformatting it.&amp;nbsp; I tried pasting it several times and even tried to manually move it around, but to no avail. &lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
    <pubDate>Mon, 01 Apr 2013 16:01:01 GMT</pubDate>
    <dc:creator>Wolverine</dc:creator>
    <dc:date>2013-04-01T16:01:01Z</dc:date>
    <item>
      <title>Eliminating duplicate dates with respect to ID# using Proc SQL</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/Eliminating-duplicate-dates-with-respect-to-ID-using-Proc-SQL/m-p/85182#M24349</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;I have several very large data files with administrative medical data.&amp;nbsp; I need to identify patients who had 3 or more healthcare encounters during the year.&amp;nbsp; The problem is that the data has a separate record for each cost incurred.&amp;nbsp; For example, a 2-day inpatient hospital stay might have 20 records -- a record with a charge for medication, another record for IV bag, another record for the room charge, etc.&amp;nbsp; But all 20 of these records have the same admission date.&amp;nbsp; So those 20 records should be combined to represent 1 healthcare encounter for that patient.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Here is some sample data:&lt;/P&gt;&lt;TABLE&gt;&lt;TBODY&gt;&lt;TR&gt;&lt;TD&gt;ID&amp;nbsp;&amp;nbsp;&amp;nbsp; ADMSN_DT&amp;nbsp; &lt;/TD&gt;&lt;TD&gt; CHARGECODE&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;1111&amp;nbsp;&amp;nbsp;&amp;nbsp; 03012008&amp;nbsp;&amp;nbsp; &lt;/TD&gt;&lt;TD&gt; 2345&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;1111&amp;nbsp;&amp;nbsp;&amp;nbsp; 03012008&amp;nbsp;&amp;nbsp; &lt;/TD&gt;&lt;TD&gt; 4567&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;1111&amp;nbsp;&amp;nbsp;&amp;nbsp; 03012008&amp;nbsp;&amp;nbsp; &lt;/TD&gt;&lt;TD&gt; 5678&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;1111&amp;nbsp;&amp;nbsp;&amp;nbsp; 04012008&amp;nbsp;&amp;nbsp; &lt;/TD&gt;&lt;TD&gt; 2345&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;1111&amp;nbsp;&amp;nbsp;&amp;nbsp; 04012008&amp;nbsp;&amp;nbsp; &lt;/TD&gt;&lt;TD&gt; 9876&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;1111&amp;nbsp;&amp;nbsp;&amp;nbsp; 09012008&amp;nbsp;&amp;nbsp; &lt;/TD&gt;&lt;TD&gt; 2345&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;2222&amp;nbsp;&amp;nbsp;&amp;nbsp; 03012008&amp;nbsp;&amp;nbsp; &lt;/TD&gt;&lt;TD&gt; 2345&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;2222&amp;nbsp;&amp;nbsp;&amp;nbsp; 03012008&amp;nbsp;&amp;nbsp; &lt;/TD&gt;&lt;TD&gt; 4567&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;2222&amp;nbsp;&amp;nbsp;&amp;nbsp; 03012008&amp;nbsp;&amp;nbsp; &lt;/TD&gt;&lt;TD&gt; 5678&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;2222&amp;nbsp;&amp;nbsp;&amp;nbsp; 03012008&amp;nbsp;&amp;nbsp; &lt;/TD&gt;&lt;TD&gt; 6789&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;2222&amp;nbsp;&amp;nbsp;&amp;nbsp; 03012008&amp;nbsp;&amp;nbsp; &lt;/TD&gt;&lt;TD&gt; 7890&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;2222&amp;nbsp;&amp;nbsp;&amp;nbsp; 03012008&amp;nbsp;&amp;nbsp; &lt;/TD&gt;&lt;TD&gt; 0123&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;2222&amp;nbsp;&amp;nbsp;&amp;nbsp; 03012008&amp;nbsp;&amp;nbsp; &lt;/TD&gt;&lt;TD&gt; 1212&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;2222&amp;nbsp;&amp;nbsp;&amp;nbsp; 04012008&amp;nbsp;&amp;nbsp; &lt;/TD&gt;&lt;TD&gt; 2345&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;2222&amp;nbsp;&amp;nbsp;&amp;nbsp; 04012008&amp;nbsp;&amp;nbsp; &lt;/TD&gt;&lt;TD&gt; 9876&lt;/TD&gt;&lt;/TR&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;The resulting data file should look like this:&lt;/P&gt;&lt;P&gt;ID&amp;nbsp;&amp;nbsp;&amp;nbsp; ADMSN_DT&lt;/P&gt;&lt;P&gt;1111&amp;nbsp;&amp;nbsp;&amp;nbsp; 03012008&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/P&gt;&lt;P&gt;1111&amp;nbsp;&amp;nbsp;&amp;nbsp; 04012008&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/P&gt;&lt;P&gt;1111&amp;nbsp;&amp;nbsp;&amp;nbsp; 09012008&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/P&gt;&lt;P&gt;2222&amp;nbsp;&amp;nbsp;&amp;nbsp; 03012008&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/P&gt;&lt;P&gt;2222&amp;nbsp;&amp;nbsp;&amp;nbsp; 04012008&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;As I stated, these are extremely large files (150GB+), So I'd really like to use Proc SQL for the sake of efficiency.&amp;nbsp; I've tried several variations on the following syntax, but no luck:&lt;/P&gt;&lt;P&gt;Proc SQL; &lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; Create Table mi.medrecords_deduped&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; as Select Distinct ID,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Count (ADMSN_DT) as ADMSNCount,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; From mi.medrecords&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; Group by ID; &lt;/P&gt;&lt;P&gt;Quit;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Thu, 21 Mar 2013 19:16:40 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/Eliminating-duplicate-dates-with-respect-to-ID-using-Proc-SQL/m-p/85182#M24349</guid>
      <dc:creator>Wolverine</dc:creator>
      <dc:date>2013-03-21T19:16:40Z</dc:date>
    </item>
    <item>
      <title>Re: Eliminating duplicate dates with respect to ID# using Proc SQL</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/Eliminating-duplicate-dates-with-respect-to-ID-using-Proc-SQL/m-p/85183#M24350</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Will this do? (NB untested code).&amp;nbsp; Count the admissions in a subquery.&amp;nbsp; If you have more than one year's data you will need to extract the year in the subquery and include it in the where conditions&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P style="font-family: 'Helvetica Neue', Helvetica, Arial, 'Lucida Grande', sans-serif; background-color: #ffffff;"&gt;Proc SQL;&lt;/P&gt;&lt;P style="font-family: 'Helvetica Neue', Helvetica, Arial, 'Lucida Grande', sans-serif; background-color: #ffffff;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; Create Table mi.medrecords_deduped as&lt;/P&gt;&lt;P style="font-family: 'Helvetica Neue', Helvetica, Arial, 'Lucida Grande', sans-serif; background-color: #ffffff;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Select Distinct a.ID&lt;/P&gt;&lt;P style="font-family: 'Helvetica Neue', Helvetica, Arial, 'Lucida Grande', sans-serif; background-color: #ffffff;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ,&amp;nbsp;&amp;nbsp; a.ADMSN_DT&lt;/P&gt;&lt;P style="font-family: 'Helvetica Neue', Helvetica, Arial, 'Lucida Grande', sans-serif; background-color: #ffffff;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; From&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;SPAN style="font-family: 'Helvetica Neue', Helvetica, Arial, 'Lucida Grande', sans-serif; background-color: #ffffff;"&gt;mi.medrecords&lt;/SPAN&gt;&amp;nbsp;&amp;nbsp; a&lt;/P&gt;&lt;P style="font-family: 'Helvetica Neue', Helvetica, Arial, 'Lucida Grande', sans-serif; background-color: #ffffff;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ,&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;SPAN style="font-size: 10pt; line-height: 1.5em;"&gt;(Select&amp;nbsp; &lt;/SPAN&gt;&lt;SPAN style="font-size: 10pt; line-height: 1.5em;"&gt;ID&lt;/SPAN&gt;&lt;/P&gt;&lt;P style="font-family: 'Helvetica Neue', Helvetica, Arial, 'Lucida Grande', sans-serif; background-color: #ffffff;"&gt;&lt;SPAN style="font-size: 10pt; line-height: 1.5em;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ,&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Count (Distinct &lt;SPAN style="font-family: 'Helvetica Neue', Helvetica, Arial, 'Lucida Grande', sans-serif; background-color: #ffffff;"&gt;ADMSN_DT&lt;/SPAN&gt;) as Admissions&lt;/SPAN&gt;&lt;/P&gt;&lt;P style="font-family: 'Helvetica Neue', Helvetica, Arial, 'Lucida Grande', sans-serif; background-color: #ffffff;"&gt;&lt;SPAN style="font-size: 10pt; line-height: 1.5em;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; From&amp;nbsp;&amp;nbsp; &lt;SPAN style="font-family: 'Helvetica Neue', Helvetica, Arial, 'Lucida Grande', sans-serif; background-color: #ffffff;"&gt;mi.medrecords&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P style="font-family: 'Helvetica Neue', Helvetica, Arial, 'Lucida Grande', sans-serif; background-color: #ffffff;"&gt;&lt;SPAN style="line-height: 1.5em; font-size: 10pt; background-color: #ffffff; font-family: 'Helvetica Neue', Helvetica, Arial, 'Lucida Grande', sans-serif;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; )&amp;nbsp;&amp;nbsp;&amp;nbsp; b&lt;/SPAN&gt;&lt;/P&gt;&lt;P style="font-family: 'Helvetica Neue', Helvetica, Arial, 'Lucida Grande', sans-serif; background-color: #ffffff;"&gt;&lt;SPAN style="line-height: 1.5em; font-size: 10pt; background-color: #ffffff; font-family: 'Helvetica Neue', Helvetica, Arial, 'Lucida Grande', sans-serif;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Where a.ID&amp;nbsp; = b.ID&lt;/SPAN&gt;&lt;/P&gt;&lt;P style="font-family: 'Helvetica Neue', Helvetica, Arial, 'Lucida Grande', sans-serif; background-color: #ffffff;"&gt;&lt;SPAN style="line-height: 1.5em; font-size: 10pt; background-color: #ffffff; font-family: 'Helvetica Neue', Helvetica, Arial, 'Lucida Grande', sans-serif;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; And Admissions &amp;gt;= 3&lt;BR /&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P style="font-family: 'Helvetica Neue', Helvetica, Arial, 'Lucida Grande', sans-serif; background-color: #ffffff;"&gt;&lt;SPAN style="font-size: 10pt; line-height: 1.5em;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Order by a.ID&lt;/SPAN&gt;&lt;/P&gt;&lt;P style="font-family: 'Helvetica Neue', Helvetica, Arial, 'Lucida Grande', sans-serif; background-color: #ffffff;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;SPAN style="font-family: 'Helvetica Neue', Helvetica, Arial, 'Lucida Grande', sans-serif; background-color: #ffffff;"&gt;,&amp;nbsp;&amp;nbsp; a.ADMSN_DT&lt;/SPAN&gt;&lt;/P&gt;&lt;P style="font-family: 'Helvetica Neue', Helvetica, Arial, 'Lucida Grande', sans-serif; background-color: #ffffff;"&gt;&lt;SPAN style="font-family: 'Helvetica Neue', Helvetica, Arial, 'Lucida Grande', sans-serif; background-color: #ffffff;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ;&lt;/SPAN&gt;&lt;/P&gt;&lt;P style="font-family: 'Helvetica Neue', Helvetica, Arial, 'Lucida Grande', sans-serif; background-color: #ffffff;"&gt;Quit;&amp;nbsp; &lt;/P&gt;&lt;P style="font-family: 'Helvetica Neue', Helvetica, Arial, 'Lucida Grande', sans-serif; background-color: #ffffff;"&gt;&lt;/P&gt;&lt;P style="font-family: 'Helvetica Neue', Helvetica, Arial, 'Lucida Grande', sans-serif; background-color: #ffffff;"&gt;Richard&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Thu, 21 Mar 2013 22:02:25 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/Eliminating-duplicate-dates-with-respect-to-ID-using-Proc-SQL/m-p/85183#M24350</guid>
      <dc:creator>RichardinOz</dc:creator>
      <dc:date>2013-03-21T22:02:25Z</dc:date>
    </item>
    <item>
      <title>Re: Eliminating duplicate dates with respect to ID# using Proc SQL</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/Eliminating-duplicate-dates-with-respect-to-ID-using-Proc-SQL/m-p/85184#M24351</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Sorry I haven't gotten back to this... unfortunately the project has become larger in scope, there are other files that have to be merged in, variable names have be changed so they match across files, etc.&amp;nbsp; So I had to rewrite what u wrote above:&amp;nbsp;&amp;nbsp;&amp;nbsp; Proc SQL;&amp;nbsp; Create Table mi.ipot2007_mi_Meas13_dedup&amp;nbsp; as Select Distinct a.MSIS_ID,&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; a.SRVC_BGN_DT&amp;nbsp; From mi.ipot2007_mi_Meas13_CUT a,&amp;nbsp; (Select MSIS_ID,&amp;nbsp; Count (Distinct SRVC_BGN_DT) as SRVC_BGNCount&amp;nbsp; From mi.ipot2007_mi_Meas13_CUT) b&amp;nbsp; Where a.MSIS_ID = b.MSIS_ID and SRVC_BGNCount &amp;gt;= 3&amp;nbsp;&amp;nbsp;&amp;nbsp; Order by a.MSIS_ID, a.SRVC_BGN_DT;&amp;nbsp; Quit; title1 'ipot2007_mi_Meas13_dedup'; proc freq; tables MSIS_ID SRVC_BGN_DT /*SRVC_BGNCount*/; run; The problem is that SAS seems to ignore the subquery -- SRVC_BGNCount doesn't even exist in the resulting data file. Likewise, it also ignores the "and SRVC_BGNCount &amp;gt;= 3" part of the Where statement.&amp;nbsp; It doesn't even give me an error, which is should becuz that part of the Where statement refers to a variable that doesn't exist in the file.&amp;nbsp; I tried removing it and that didn't change the results at all.&lt;/P&gt;&lt;P&gt;Isn't there any way to get this forum to display plain text w/o reformatting it??&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Fri, 29 Mar 2013 16:35:12 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/Eliminating-duplicate-dates-with-respect-to-ID-using-Proc-SQL/m-p/85184#M24351</guid>
      <dc:creator>Wolverine</dc:creator>
      <dc:date>2013-03-29T16:35:12Z</dc:date>
    </item>
    <item>
      <title>Re: Eliminating duplicate dates with respect to ID# using Proc SQL</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/Eliminating-duplicate-dates-with-respect-to-ID-using-Proc-SQL/m-p/85185#M24352</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Wolverine&lt;/P&gt;&lt;P&gt;I had to reformat your SQL before I could answer your question.&amp;nbsp; I assume that the concatenation occurred when you pasted from source code - I've never had that problem.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Anyway: ? &lt;SPAN style="font-family: 'Helvetica Neue', Helvetica, Arial, 'Lucida Grande', sans-serif; background-color: #ffffff;"&gt;The problem is that SAS seems to ignore the subquery -- SRVC_BGNCount doesn't even exist in the resulting data file.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: 'Helvetica Neue', Helvetica, Arial, 'Lucida Grande', sans-serif; background-color: #ffffff;"&gt;@ Well, if you want the count in the output dataset you need to include it in the outer select statement.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;? &lt;SPAN style="font-family: 'Helvetica Neue', Helvetica, Arial, 'Lucida Grande', sans-serif; background-color: #ffffff;"&gt;Likewise, it also ignores the "and SRVC_BGNCount &amp;gt;= 3" part of the Where statement.&amp;nbsp; It doesn't even give me an error, which is should becuz that part of the Where statement refers to a variable that doesn't exist in the file. &lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;@ It does not exist in the output file but it is available to SQL during the query&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P class="p1"&gt;Proc SQL;&lt;/P&gt;&lt;P class="p1"&gt;&amp;nbsp; Create Table mi.ipot2007_mi_Meas13_dedup&amp;nbsp; as&lt;/P&gt;&lt;P class="p1"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; Select Distinct a.MSIS_ID&lt;/P&gt;&lt;P class="p1"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ,&amp;nbsp;&amp;nbsp;&amp;nbsp; a.SRVC_BGN_DT&lt;/P&gt;&lt;P class="p1"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ,&amp;nbsp;&amp;nbsp; &lt;STRONG&gt; b.SRVC_BGNCount&lt;/STRONG&gt;&lt;/P&gt;&lt;P class="p1"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; From mi.ipot2007_mi_Meas13_CUT a&lt;/P&gt;&lt;P class="p1"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ,&amp;nbsp;&amp;nbsp; (Select MSIS_ID,&lt;/P&gt;&lt;P class="p1"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Count (Distinct SRVC_BGN_DT) as SRVC_BGNCount&lt;/P&gt;&lt;P class="p1"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; From mi.ipot2007_mi_Meas13_CUT&lt;/P&gt;&lt;P class="p1"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ) b&lt;/P&gt;&lt;P class="p1"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; Where a.MSIS_ID = b.MSIS_ID&lt;/P&gt;&lt;P class="p1"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; and SRVC_BGNCount &amp;gt;= 3&amp;nbsp; &lt;/P&gt;&lt;P class="p1"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; Order by a.MSIS_ID&lt;/P&gt;&lt;P class="p1"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ,&amp;nbsp;&amp;nbsp;&amp;nbsp; a.SRVC_BGN_DT&lt;/P&gt;&lt;P class="p1"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; ;&lt;/P&gt;&lt;P class="p1"&gt;Quit ;&lt;/P&gt;&lt;P class="p1"&gt;&lt;/P&gt;&lt;P class="p1"&gt;With the above version of your code you should be able to test that the count is indeed &amp;gt;= 3 for all records output to the table.&lt;/P&gt;&lt;P class="p1"&gt;&lt;/P&gt;&lt;P class="p1"&gt;Richard&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Sun, 31 Mar 2013 00:20:44 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/Eliminating-duplicate-dates-with-respect-to-ID-using-Proc-SQL/m-p/85185#M24352</guid>
      <dc:creator>RichardinOz</dc:creator>
      <dc:date>2013-03-31T00:20:44Z</dc:date>
    </item>
    <item>
      <title>Re: Eliminating duplicate dates with respect to ID# using Proc SQL</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/Eliminating-duplicate-dates-with-respect-to-ID-using-Proc-SQL/m-p/85186#M24353</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;If you're having several source tables then you could also use a union set operator for de-dupping.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;proc sql;&lt;BR /&gt;&amp;nbsp; create table want as&lt;BR /&gt;&amp;nbsp; select *, count(*) as N_Admissions&lt;BR /&gt;&amp;nbsp; from&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; ( &lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; select id, admsn_dt&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; from have1&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; union corr&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; select id, admsn_dt&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; from have2&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; ) &lt;BR /&gt;&amp;nbsp; group by id&lt;BR /&gt;&amp;nbsp; having count(*)&amp;gt; 3&lt;BR /&gt;&amp;nbsp; ;&lt;BR /&gt;quit;&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Sun, 31 Mar 2013 04:29:44 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/Eliminating-duplicate-dates-with-respect-to-ID-using-Proc-SQL/m-p/85186#M24353</guid>
      <dc:creator>Patrick</dc:creator>
      <dc:date>2013-03-31T04:29:44Z</dc:date>
    </item>
    <item>
      <title>Re: Eliminating duplicate dates with respect to ID# using Proc SQL</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/Eliminating-duplicate-dates-with-respect-to-ID-using-Proc-SQL/m-p/85187#M24354</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;If your data sources are SAS tables then another approach for de-dupping is loading the data into a hash table. Depending on your actual data this could perform better.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;/* de-dup source keys by loading into hash */&lt;BR /&gt;data _null_;&lt;BR /&gt;&amp;nbsp; set&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; have1 (keep=id admsn_dt)&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; have2 (keep=id admsn_dt)&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; end=last;&lt;BR /&gt;&amp;nbsp; ;&lt;/P&gt;&lt;P&gt;&amp;nbsp; if _n_=1 then&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; do;&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; declare hash h0(ordered:'y',multidata:'n',hashexp:8);&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; _rc=h0.defineKey('id','admsn_dt');&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; _rc=h0.defineData('id','admsn_dt');&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; _rc=h0.defineDone();&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; end;&lt;/P&gt;&lt;P&gt;&amp;nbsp; _rc=h0.ref();&lt;/P&gt;&lt;P&gt;&amp;nbsp; /* write hash to target data set "want" */&lt;BR /&gt;&amp;nbsp; if last then&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; h0.output(dataset:'inter');&lt;/P&gt;&lt;P&gt;run;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;proc sql;&lt;BR /&gt;&amp;nbsp; create table want as&lt;BR /&gt;&amp;nbsp; select *,count(*) as N_Admissions&lt;BR /&gt;&amp;nbsp; from inter&lt;BR /&gt;&amp;nbsp; group by id&lt;BR /&gt;&amp;nbsp; having count(*)&amp;gt;3&lt;BR /&gt;&amp;nbsp; ;&lt;BR /&gt;quit;&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Sun, 31 Mar 2013 04:58:39 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/Eliminating-duplicate-dates-with-respect-to-ID-using-Proc-SQL/m-p/85187#M24354</guid>
      <dc:creator>Patrick</dc:creator>
      <dc:date>2013-03-31T04:58:39Z</dc:date>
    </item>
    <item>
      <title>Re: Eliminating duplicate dates with respect to ID# using Proc SQL</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/Eliminating-duplicate-dates-with-respect-to-ID-using-Proc-SQL/m-p/85188#M24355</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Ok, I figured out a little bit more about what's going on.&amp;nbsp; SRVC_BGNCount is counting the total number of unique service begin dates (SRVC_BGN_DT) in the file.&amp;nbsp; In other words, it's eliminating duplicates across the entire file, not just w/ respect to the ID variable.&amp;nbsp; The value is 365, which is greater than 3, so every record is included in the output file. Sorry about the confusion from the syntax I pasted... the forum software kept reformatting it.&amp;nbsp; I tried pasting it several times and even tried to manually move it around, but to no avail. &lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Mon, 01 Apr 2013 16:01:01 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/Eliminating-duplicate-dates-with-respect-to-ID-using-Proc-SQL/m-p/85188#M24355</guid>
      <dc:creator>Wolverine</dc:creator>
      <dc:date>2013-04-01T16:01:01Z</dc:date>
    </item>
    <item>
      <title>Re: Eliminating duplicate dates with respect to ID# using Proc SQL</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/Eliminating-duplicate-dates-with-respect-to-ID-using-Proc-SQL/m-p/85189#M24356</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;I tried the hash table approach as well, and that seems to work.&amp;nbsp; However, it doesn't count the ID's properly.&amp;nbsp; So for example, if the file "want" looks like this: &lt;/P&gt;&lt;P&gt;1111&amp;nbsp;&amp;nbsp;&amp;nbsp; 03012008&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/P&gt;&lt;P&gt;1111&amp;nbsp;&amp;nbsp;&amp;nbsp; 04012008&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/P&gt;&lt;P&gt;1111&amp;nbsp;&amp;nbsp;&amp;nbsp; 09012008&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/P&gt;&lt;P&gt;2222&amp;nbsp;&amp;nbsp;&amp;nbsp; 03012008&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/P&gt;&lt;P&gt;2222&amp;nbsp;&amp;nbsp;&amp;nbsp; 04012008&lt;/P&gt;&lt;P&gt;2222&amp;nbsp;&amp;nbsp;&amp;nbsp; 06172008&lt;/P&gt;&lt;P&gt;3333&amp;nbsp;&amp;nbsp;&amp;nbsp; 03012008&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/P&gt;&lt;P&gt;3333&amp;nbsp;&amp;nbsp;&amp;nbsp; 04012008&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/P&gt;&lt;P&gt;3333&amp;nbsp;&amp;nbsp;&amp;nbsp; 09012008 &lt;/P&gt;&lt;P&gt;3333&amp;nbsp;&amp;nbsp;&amp;nbsp; 09082008 &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Then the final count should be 3, becuz there are 3 different IDs that have at least 3 healthcare encounters on separate dates.&amp;nbsp; Instead, the count will be 10, becuz there are 10 different records.&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Mon, 01 Apr 2013 18:06:29 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/Eliminating-duplicate-dates-with-respect-to-ID-using-Proc-SQL/m-p/85189#M24356</guid>
      <dc:creator>Wolverine</dc:creator>
      <dc:date>2013-04-01T18:06:29Z</dc:date>
    </item>
    <item>
      <title>Re: Eliminating duplicate dates with respect to ID# using Proc SQL</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/Eliminating-duplicate-dates-with-respect-to-ID-using-Proc-SQL/m-p/85190#M24357</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;So how should your final output data set look like? So far I've understood you want one row per id and date.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;If so then it's only a question of how you're counting. So far the hash is only used for de-dupping (and though reducing volumes as well). The SQL does all the counting and it's only a question of how you formulate it.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;proc sql;&lt;/P&gt;&lt;P&gt;&amp;nbsp; create table want as&lt;/P&gt;&lt;P&gt;&amp;nbsp; select&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; *,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; count(id) as Total_Admissions,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; count(distinct(id)) as Total_Patients&lt;/P&gt;&lt;P&gt;&amp;nbsp; from&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; (&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; select *,count(*) as Admissions_Per_Patient&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; from inter&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; group by id&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; having count(*)&amp;gt;3&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; )&lt;/P&gt;&lt;P&gt;&amp;nbsp; ;&lt;/P&gt;&lt;P&gt;quit;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Or this code version which does everything in one big SQL which would be the approach to take if your data lives in a data base:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;proc sql;&lt;/P&gt;&lt;P&gt;&amp;nbsp; create table want as&lt;/P&gt;&lt;P&gt;&amp;nbsp; select &lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; *, &lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; count(id) as Total_Admissions,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; count(distinct(id)) as Total_Patients&lt;/P&gt;&lt;P&gt;&amp;nbsp; from &lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; (&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; select *,count(*) as Admissions_Per_Patient&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; from &lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; (&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; select id, admsn_dt&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; from have1&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; union corr&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; select id, admsn_dt&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; from have2&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; )&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; group by id&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; having count(*)&amp;gt;3&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; )&lt;/P&gt;&lt;P&gt;&amp;nbsp; ;&lt;/P&gt;&lt;P&gt;quit;&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Tue, 02 Apr 2013 01:45:15 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/Eliminating-duplicate-dates-with-respect-to-ID-using-Proc-SQL/m-p/85190#M24357</guid>
      <dc:creator>Patrick</dc:creator>
      <dc:date>2013-04-02T01:45:15Z</dc:date>
    </item>
    <item>
      <title>Re: Eliminating duplicate dates with respect to ID# using Proc SQL</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/Eliminating-duplicate-dates-with-respect-to-ID-using-Proc-SQL/m-p/85191#M24358</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;PRE __jive_macro_name="quote" class="jive_text_macro jive_macro_quote" modifiedtitle="true"&gt;
&lt;P&gt;Patrick wrote:&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;
&lt;P&gt;So how should your final output data set look like? So far I've understood you want one row per id and date.&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;
&lt;/PRE&gt;&lt;P&gt;Yeah that wasn't very clear.&amp;nbsp; Part of the problem is that the higher-ups at my work redefined (and expanded) the task after I had already started working on it.&amp;nbsp; I had working syntax that could have done the counting, but I'm not sure how to adapt it to the expanded task.&amp;nbsp; I've also never worked with hash tables before, so I'd have to adapt it to work w/ that as well.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;The final data file should have a single record for each ID.&amp;nbsp; Ideally, I like it to have the ID variable and the count variable, so I can spot-check the output.&amp;nbsp; The cumulative frequency for the ID variable in the output would provide the total number of people identified.&amp;nbsp; Based on my example above, the output data file would be&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;1111&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 3&lt;/P&gt;&lt;P&gt;2222&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 3&lt;/P&gt;&lt;P&gt;3333&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 4&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;And the frequency table would like something like this (ID, freq, cumulative freq)&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;1111&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 1&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 1&lt;/P&gt;&lt;P&gt;2222&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 1&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 2&lt;/P&gt;&lt;P&gt;3333&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 1&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 3&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Also, I don't really understand why there are 2 "have" tables.&amp;nbsp; I'm not sure which of my file names I should put there.&amp;nbsp; &lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Tue, 02 Apr 2013 17:38:20 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/Eliminating-duplicate-dates-with-respect-to-ID-using-Proc-SQL/m-p/85191#M24358</guid>
      <dc:creator>Wolverine</dc:creator>
      <dc:date>2013-04-02T17:38:20Z</dc:date>
    </item>
    <item>
      <title>Re: Eliminating duplicate dates with respect to ID# using Proc SQL</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/Eliminating-duplicate-dates-with-respect-to-ID-using-Proc-SQL/m-p/85192#M24359</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Identifying the patients should be fairly straightforward:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;proc sql;&lt;/P&gt;&lt;P&gt;create table patients as&lt;/P&gt;&lt;P&gt;select&lt;/P&gt;&lt;P&gt;id,&lt;/P&gt;&lt;P&gt;count(distinct admsn_dt) as Admits&lt;/P&gt;&lt;P&gt;from sample&lt;/P&gt;&lt;P&gt;group by id&lt;/P&gt;&lt;P&gt;having count(distinct ADMSN_DT) &amp;gt;= 3;&lt;/P&gt;&lt;P&gt;quit;&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Tue, 02 Apr 2013 19:06:15 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/Eliminating-duplicate-dates-with-respect-to-ID-using-Proc-SQL/m-p/85192#M24359</guid>
      <dc:creator>DBailey</dc:creator>
      <dc:date>2013-04-02T19:06:15Z</dc:date>
    </item>
    <item>
      <title>Re: Eliminating duplicate dates with respect to ID# using Proc SQL</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/Eliminating-duplicate-dates-with-respect-to-ID-using-Proc-SQL/m-p/85193#M24360</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;You wrote " there are other files that have to be merged in, variable names have be changed so they match across files, etc"&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;As this is about performance the one thing you want to minimize are passes through data. So instead of merging/appending multiple files in an extra data step you can also do it in the same data step where you use the hash table for de-dupping. That's what I've tried to express in the code I've posted by using 2 source tables "have". I don't know your real data / source tables so you might have to implement a bit different. Just try to minimize read/write operations from/to disk as this is normally the bottleneck.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;As for your final output table it appears you don't need the count of admissions per patients at all except for result checking. Again as this is about performance I wouldn't go for costly operations if not needed. Below code collects all the data you need for your frequency table in a hash table.&lt;/P&gt;&lt;P&gt;You should read in the doc how hash tables work. Especially setting a good value for "hashexp" determines how efficient the hash table can operate - and setting the value depends on the expected size of the hash.&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;/* de-dup source keys by loading into hash */&lt;BR /&gt;data _null_;&lt;BR /&gt;&amp;nbsp; set&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; have1 (keep=id admsn_dt)&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; have2 (keep=id admsn_dt)&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; end=last;&lt;BR /&gt;&amp;nbsp; ;&lt;/P&gt;&lt;P&gt;&amp;nbsp; if _n_=1 then&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; do;&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; declare hash H_ID(ordered:'y',multidata:'n',hashexp:7);&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; _rc=H_ID.defineKey('id');&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; _rc=H_ID.defineData('id');&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; _rc=H_ID.defineDone();&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; end;&lt;/P&gt;&lt;P&gt;&amp;nbsp; _rc=H_ID.ref();&lt;/P&gt;&lt;P&gt;&amp;nbsp; /* write hash to target data set "want" */&lt;BR /&gt;&amp;nbsp; if last then&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; H_ID.output(dataset:'inter2');&lt;BR /&gt;run;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;data want2;&lt;BR /&gt;&amp;nbsp; set inter2;&lt;BR /&gt;&amp;nbsp; freq=1;&lt;BR /&gt;&amp;nbsp; cum_freq=_n_;&lt;BR /&gt;run;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;And here the code version which includes also the count of admissions per patients. Comparing with the version above you can see that there are many more read/write operations - also most of them happen in-memory so it should still be quite fast.&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;/* de-dup source keys by loading into hash */&lt;BR /&gt;data _null_;&lt;BR /&gt;&amp;nbsp; set&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; have1 (keep=id admsn_dt)&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; have2 (keep=id admsn_dt)&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; end=last;&lt;BR /&gt;&amp;nbsp; ;&lt;BR /&gt;&amp;nbsp; length N_Admissions 8.;&lt;/P&gt;&lt;P&gt;&amp;nbsp; if _n_=1 then&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; do;&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; declare hash H_dedup(multidata:'n',hashexp:7);&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; _rc=H_dedup.defineKey('id','admsn_dt');&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; _rc=H_dedup.defineDone();&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; declare hash H_ID(ordered:'y',multidata:'n',hashexp:7);&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; _rc=H_ID.defineKey('id');&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; _rc=H_ID.defineData('id','N_Admissions');&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; _rc=H_ID.defineDone();&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; end;&lt;/P&gt;&lt;P&gt;&amp;nbsp; if H_dedup.check() ne 0 then&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; do;&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; _rc=H_dedup.add();&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; if H_ID.find()=0 then&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; do;&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; N_Admissions=sum(N_Admissions,1);&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; _rc=H_ID.replace();&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; end;&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; else&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; do;&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; N_Admissions=1;&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; _rc=H_ID.add();&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; end;&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; end;&lt;/P&gt;&lt;P&gt;&amp;nbsp; /* write hash to target data set "want" */&lt;BR /&gt;&amp;nbsp; if last then&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; H_ID.output(dataset:'inter1');&lt;BR /&gt;run;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;data want1;&lt;BR /&gt;&amp;nbsp; set inter1;&lt;BR /&gt;&amp;nbsp; freq=1;&lt;BR /&gt;&amp;nbsp; cum_freq=_n_;&lt;BR /&gt;run;&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Wed, 03 Apr 2013 00:26:16 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/Eliminating-duplicate-dates-with-respect-to-ID-using-Proc-SQL/m-p/85193#M24360</guid>
      <dc:creator>Patrick</dc:creator>
      <dc:date>2013-04-03T00:26:16Z</dc:date>
    </item>
    <item>
      <title>Re: Eliminating duplicate dates with respect to ID# using Proc SQL</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/Eliminating-duplicate-dates-with-respect-to-ID-using-Proc-SQL/m-p/85194#M24361</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;There are a lot of restrictions and criteria that have to be met for in this syntax, and accordingly, the file merges, record counts, deduplicating, etc have to be done in a certain order to accurately identify the correct IDs.&amp;nbsp; So I can't do the merges inside the hash table.&amp;nbsp; I've created a flowchart for the syntax to help me visualize all the steps, and I've been careful about making it as efficient as possible while still finding the correct IDs.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Anyway, I'm using the syntax below and it works great -- it finds the correct IDs (which I verified by spot-checking) and is relatively fast.&amp;nbsp; There is just 1 more issue.&amp;nbsp; Given the size of the files I'm dealing w/, I'd prefer to write the temp files to a specific directory, rather than the default work directory.&amp;nbsp; The C:\ drive on this computer is a 256GB SSD, while the D:\ drive is a 6TB RAID 10 array.&amp;nbsp; Is there any way to assign the dataset "inter" to the "mi." data library?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;/* de-dup source keys by loading into hash */&lt;/P&gt;&lt;P&gt;data mi.ipot2007_mi_meas13_dedup;&lt;/P&gt;&lt;P&gt;&amp;nbsp; set&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; mi.ipot2007_mi_meas13_cut (keep=MSIS_ID SRVC_BGN_DT)&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; end=last;&lt;/P&gt;&lt;P&gt;&amp;nbsp; ;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp; if _n_=1 then&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; do;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; declare hash h0(ordered:'y',multidata:'n',hashexp:8);&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; _rc=h0.defineKey('MSIS_ID','SRVC_BGN_DT');&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; _rc=h0.defineData('MSIS_ID','SRVC_BGN_DT');&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; _rc=h0.defineDone();&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; end;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp; _rc=h0.ref();&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp; /* write hash to target data set "want" */&lt;/P&gt;&lt;P&gt;&amp;nbsp; if last then&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; h0.output(dataset:'inter');&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; /*IS IT POSSIBLE TO ASSIGN A DATA LIBRARY TO 'INTER'?*/&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;run;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;proc sql;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; create table mi.ipot2007_mi_Meas13Had3 as&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; select MSIS_ID,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; count(distinct SRVC_BGN_DT) as numsepvisits&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; from work.inter&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; group by MSIS_ID&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; having count(distinct SRVC_BGN_DT) &amp;gt;= 3;&lt;/P&gt;&lt;P&gt;quit;&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Fri, 05 Apr 2013 14:59:25 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/Eliminating-duplicate-dates-with-respect-to-ID-using-Proc-SQL/m-p/85194#M24361</guid>
      <dc:creator>Wolverine</dc:creator>
      <dc:date>2013-04-05T14:59:25Z</dc:date>
    </item>
    <item>
      <title>Re: Eliminating duplicate dates with respect to ID# using Proc SQL</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/Eliminating-duplicate-dates-with-respect-to-ID-using-Proc-SQL/m-p/85195#M24362</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;libname mi 'd:\whateverdirectory';&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Fri, 05 Apr 2013 19:39:22 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/Eliminating-duplicate-dates-with-respect-to-ID-using-Proc-SQL/m-p/85195#M24362</guid>
      <dc:creator>DBailey</dc:creator>
      <dc:date>2013-04-05T19:39:22Z</dc:date>
    </item>
    <item>
      <title>Re: Eliminating duplicate dates with respect to ID# using Proc SQL</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/Eliminating-duplicate-dates-with-respect-to-ID-using-Proc-SQL/m-p/85196#M24363</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;You can use the normal 2 level syntax to write the hash from memory to disk as you would use for any other SAS table:&lt;/P&gt;&lt;P&gt;h0.output(dataset:'&amp;lt;libref&amp;gt;.inter');&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I hope your RAID10 is fiber attached so that you don't loose too much performance by writing "inter" to a permanent location.&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Fri, 05 Apr 2013 23:26:50 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/Eliminating-duplicate-dates-with-respect-to-ID-using-Proc-SQL/m-p/85196#M24363</guid>
      <dc:creator>Patrick</dc:creator>
      <dc:date>2013-04-05T23:26:50Z</dc:date>
    </item>
  </channel>
</rss>

