<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: how to remove duplicates in SAS 4.3 EG in SAS Enterprise Guide</title>
    <link>https://communities.sas.com/t5/SAS-Enterprise-Guide/how-to-remove-duplicates-in-SAS-4-3-EG/m-p/109606#M9382</link>
    <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;I work with pharmacy claims, in which we often want to have a final report with only the most current fill reported.&amp;nbsp; We remove the duplicate claims, that have some different data and some matching, through using group filters in EG.&amp;nbsp;&amp;nbsp; We also are non-coders.&amp;nbsp; &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;The process is to pull the full data set first.&amp;nbsp; Then query off it, for our example, we first select to group by the Member ID, and GPI (unique drug identifier) - therefore keeping the combination of member + drug, regardless of the other information on the claim )such as costs, pharmacy, etc) and then we use the MAX summary function on the date field.&amp;nbsp; &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Once you have selected to do the MAX of date, on the filter tab, the "filter the summarized data" option will show up at the bottom half of the screen.&amp;nbsp;&amp;nbsp; We then enter in the filter to this section where (CALCULATED MAX DATE) = DATE.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;This is probably a bit confusing to read through, but I think it's a really good way to get to the most recent data out of millions of rows of multiple transactions.&amp;nbsp; I would assume this would be helpful further than the Pharmacy industry.&lt;/P&gt;&lt;P&gt;&lt;IMG alt="9-19-2012 2-59-41 PM.png" class="jive-image-thumbnail jive-image" src="https://communities.sas.com/legacyfs/online/2554_9-19-2012 2-59-41 PM.png" width="450" /&gt;&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
    <pubDate>Wed, 19 Sep 2012 20:00:38 GMT</pubDate>
    <dc:creator>EmilyC</dc:creator>
    <dc:date>2012-09-19T20:00:38Z</dc:date>
    <item>
      <title>how to remove duplicates in SAS 4.3 EG</title>
      <link>https://communities.sas.com/t5/SAS-Enterprise-Guide/how-to-remove-duplicates-in-SAS-4-3-EG/m-p/109599#M9375</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Hello all&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Got a query on how to remove dulicates while using SAS 4.3 EG&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Any pointers would be of help.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thanks&lt;/P&gt;&lt;P&gt;Sundari&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Fri, 31 Aug 2012 01:36:28 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Enterprise-Guide/how-to-remove-duplicates-in-SAS-4-3-EG/m-p/109599#M9375</guid>
      <dc:creator>sundari</dc:creator>
      <dc:date>2012-08-31T01:36:28Z</dc:date>
    </item>
    <item>
      <title>Re: how to remove duplicates in SAS 4.3 EG</title>
      <link>https://communities.sas.com/t5/SAS-Enterprise-Guide/how-to-remove-duplicates-in-SAS-4-3-EG/m-p/109600#M9376</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Assuming you want to remove duplicate observations with the same values?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;data have;&lt;/P&gt;&lt;P&gt;input x y;&lt;/P&gt;&lt;P&gt;cards;&lt;/P&gt;&lt;P&gt;1 2&lt;/P&gt;&lt;P&gt;1 4&lt;/P&gt;&lt;P&gt;1 8&lt;/P&gt;&lt;P&gt;2 3&lt;/P&gt;&lt;P&gt;2 4&lt;/P&gt;&lt;P&gt;2 4&lt;/P&gt;&lt;P&gt;2 5&lt;/P&gt;&lt;P&gt;1 7&lt;/P&gt;&lt;P&gt;1 8&lt;/P&gt;&lt;P&gt;;&lt;/P&gt;&lt;P&gt;run;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;/* PROC SORT method */&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;proc sort data=have nodup out=want1;&lt;/P&gt;&lt;P&gt;by x y;&lt;/P&gt;&lt;P&gt;quit;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;/* PROC SQL method */&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;proc sql;&lt;/P&gt;&lt;P&gt;create table want2 as select distinct * from have;&lt;/P&gt;&lt;P&gt;quit;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;If you want to delete duplicates that are not 100% identical (e.g. all units with the same ID variable regardless of whether they differ in other values) look at the NODUPKEY option in PROC SORT.&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Fri, 31 Aug 2012 02:20:51 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Enterprise-Guide/how-to-remove-duplicates-in-SAS-4-3-EG/m-p/109600#M9376</guid>
      <dc:creator>GeoffreyBrent</dc:creator>
      <dc:date>2012-08-31T02:20:51Z</dc:date>
    </item>
    <item>
      <title>Re: how to remove duplicates in SAS 4.3 EG</title>
      <link>https://communities.sas.com/t5/SAS-Enterprise-Guide/how-to-remove-duplicates-in-SAS-4-3-EG/m-p/109601#M9377</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Use the query builder, select all columns, tick box "distinct rows"&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Fri, 31 Aug 2012 09:35:04 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Enterprise-Guide/how-to-remove-duplicates-in-SAS-4-3-EG/m-p/109601#M9377</guid>
      <dc:creator>Patrick</dc:creator>
      <dc:date>2012-08-31T09:35:04Z</dc:date>
    </item>
    <item>
      <title>Re: how to remove duplicates in SAS 4.3 EG</title>
      <link>https://communities.sas.com/t5/SAS-Enterprise-Guide/how-to-remove-duplicates-in-SAS-4-3-EG/m-p/109602#M9378</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;You can also your the sorttask and select no dublicated keys or obs in the option panel.&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Fri, 31 Aug 2012 10:48:47 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Enterprise-Guide/how-to-remove-duplicates-in-SAS-4-3-EG/m-p/109602#M9378</guid>
      <dc:creator>AndersBergquist</dc:creator>
      <dc:date>2012-08-31T10:48:47Z</dc:date>
    </item>
    <item>
      <title>Re: how to remove duplicates in SAS 4.3 EG</title>
      <link>https://communities.sas.com/t5/SAS-Enterprise-Guide/how-to-remove-duplicates-in-SAS-4-3-EG/m-p/109603#M9379</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Don't use the sort task to get rid of duplicate records (=all variable values equal). This will only give you the expected result if you also sort by ALL variables in the data set (and then you could also choose "Keep only the first record per by group").&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Selecting option "Keep only one of each record..." in the sort task creates a Proc Sort statement using NODUP (which is NODUPRECS). This option is imho one of the less lucky implementations SAS did. It's quite deceptive and I've seen already a lot of people stumbling over it.&lt;/P&gt;&lt;P&gt;What NODUP does is removing identical rows (all variable values the same) only if they follow right one after the other - and you have no control if this happens this way if you don't sort by all variables (and then NODUPKEY gives you the same result).&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Here an example to showcase the issue&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;data sample;&lt;/P&gt;&lt;P&gt;&amp;nbsp; input key var $;&lt;/P&gt;&lt;P&gt;&amp;nbsp; datalines;&lt;/P&gt;&lt;P&gt;1 a&lt;/P&gt;&lt;P&gt;1 b&lt;/P&gt;&lt;P&gt;1 b&lt;/P&gt;&lt;P&gt;1 a&lt;/P&gt;&lt;P&gt;;&lt;/P&gt;&lt;P&gt;run;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;IMG alt="Capture1.PNG" class="jive-image-thumbnail jive-image" src="https://communities.sas.com/legacyfs/online/2448_Capture1.PNG" width="450" /&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;IMG alt="Capture.PNG" class="jive-image" src="https://communities.sas.com/legacyfs/online/2449_Capture.PNG" /&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;This generates the following Proc Sort code (in EG4.3):&lt;/P&gt;&lt;P&gt;PROC SORT DATA=WORK.SAMPLE&lt;BR /&gt;&amp;nbsp;&amp;nbsp; OUT=WORK.SORTSortedSAMPLE(LABEL="Sorted WORK.SAMPLE")&lt;BR /&gt;&amp;nbsp;&amp;nbsp; NODUP&lt;BR /&gt;&amp;nbsp;&amp;nbsp; ;&lt;BR /&gt;&amp;nbsp;&amp;nbsp; BY key;&lt;/P&gt;&lt;P&gt;RUN;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;One would now expect to get 2 rows as result&lt;/P&gt;&lt;P&gt;1 a&lt;/P&gt;&lt;P&gt;1 b&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;BUT that's what you will get:&lt;/P&gt;&lt;P&gt;&lt;IMG alt="Capture.PNG" class="jive-image" src="https://communities.sas.com/legacyfs/online/2450_Capture.PNG" /&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Selection of "Maintaining original data order..." will also influence the result - but with the simple sample data provided the result will be wrong as well.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Conclusion&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;Use the query builder with "select distinct rows only" for de-dupping.&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Sat, 01 Sep 2012 02:54:23 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Enterprise-Guide/how-to-remove-duplicates-in-SAS-4-3-EG/m-p/109603#M9379</guid>
      <dc:creator>Patrick</dc:creator>
      <dc:date>2012-09-01T02:54:23Z</dc:date>
    </item>
    <item>
      <title>Re: how to remove duplicates in SAS 4.3 EG</title>
      <link>https://communities.sas.com/t5/SAS-Enterprise-Guide/how-to-remove-duplicates-in-SAS-4-3-EG/m-p/109604#M9380</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Patrick is correct, I was aware of this rather important detail but completely neglected to mention it in my response &lt;span class="lia-unicode-emoji" title=":disappointed_face:"&gt;😞&lt;/span&gt; In hindsight, my PROC SORT example should have used "by _all_".&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Sun, 02 Sep 2012 23:59:02 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Enterprise-Guide/how-to-remove-duplicates-in-SAS-4-3-EG/m-p/109604#M9380</guid>
      <dc:creator>GeoffreyBrent</dc:creator>
      <dc:date>2012-09-02T23:59:02Z</dc:date>
    </item>
    <item>
      <title>Re: how to remove duplicates in SAS 4.3 EG</title>
      <link>https://communities.sas.com/t5/SAS-Enterprise-Guide/how-to-remove-duplicates-in-SAS-4-3-EG/m-p/109605#M9381</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Thank you guys...&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Thu, 13 Sep 2012 20:52:48 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Enterprise-Guide/how-to-remove-duplicates-in-SAS-4-3-EG/m-p/109605#M9381</guid>
      <dc:creator>sundari</dc:creator>
      <dc:date>2012-09-13T20:52:48Z</dc:date>
    </item>
    <item>
      <title>Re: how to remove duplicates in SAS 4.3 EG</title>
      <link>https://communities.sas.com/t5/SAS-Enterprise-Guide/how-to-remove-duplicates-in-SAS-4-3-EG/m-p/109606#M9382</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;I work with pharmacy claims, in which we often want to have a final report with only the most current fill reported.&amp;nbsp; We remove the duplicate claims, that have some different data and some matching, through using group filters in EG.&amp;nbsp;&amp;nbsp; We also are non-coders.&amp;nbsp; &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;The process is to pull the full data set first.&amp;nbsp; Then query off it, for our example, we first select to group by the Member ID, and GPI (unique drug identifier) - therefore keeping the combination of member + drug, regardless of the other information on the claim )such as costs, pharmacy, etc) and then we use the MAX summary function on the date field.&amp;nbsp; &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Once you have selected to do the MAX of date, on the filter tab, the "filter the summarized data" option will show up at the bottom half of the screen.&amp;nbsp;&amp;nbsp; We then enter in the filter to this section where (CALCULATED MAX DATE) = DATE.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;This is probably a bit confusing to read through, but I think it's a really good way to get to the most recent data out of millions of rows of multiple transactions.&amp;nbsp; I would assume this would be helpful further than the Pharmacy industry.&lt;/P&gt;&lt;P&gt;&lt;IMG alt="9-19-2012 2-59-41 PM.png" class="jive-image-thumbnail jive-image" src="https://communities.sas.com/legacyfs/online/2554_9-19-2012 2-59-41 PM.png" width="450" /&gt;&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Wed, 19 Sep 2012 20:00:38 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Enterprise-Guide/how-to-remove-duplicates-in-SAS-4-3-EG/m-p/109606#M9382</guid>
      <dc:creator>EmilyC</dc:creator>
      <dc:date>2012-09-19T20:00:38Z</dc:date>
    </item>
    <item>
      <title>Re: how to remove duplicates in SAS 4.3 EG</title>
      <link>https://communities.sas.com/t5/SAS-Enterprise-Guide/how-to-remove-duplicates-in-SAS-4-3-EG/m-p/109607#M9383</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;By chance, I just added a blog post with a simplified example of what EmilyC describes. &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;A href="http://blogs.sas.com/content/sasdummy/2012/09/18/having-clause-fun-with-sas-enterprise-guide/" title="http://blogs.sas.com/content/sasdummy/2012/09/18/having-clause-fun-with-sas-enterprise-guide/"&gt; HAVING (clause) fun with SAS Enterprise Guide - The SAS Dummy&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Chris&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Wed, 19 Sep 2012 20:31:03 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Enterprise-Guide/how-to-remove-duplicates-in-SAS-4-3-EG/m-p/109607#M9383</guid>
      <dc:creator>ChrisHemedinger</dc:creator>
      <dc:date>2012-09-19T20:31:03Z</dc:date>
    </item>
  </channel>
</rss>

