<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Help dedup dataset based on date condition in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Help-dedup-dataset-based-on-date-condition/m-p/815916#M322031</link>
    <description>&lt;P&gt;Hi AMSAS,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thank you first for your reply.&lt;/P&gt;&lt;P&gt;This is what I get when I run the code (below):&lt;/P&gt;&lt;P&gt;What I wanted exactly to keep the first observation for a person ( same dob, first_name, and last_name) then:&lt;/P&gt;&lt;P&gt;- remove all other obs with specimen_date less than the first (previous) specimen_date by 12 months (1 year).&amp;nbsp;&lt;/P&gt;&lt;P&gt;- keep the obs with specimen date different than the previous specimen date by more than 12 months (1 year).&amp;nbsp;&lt;/P&gt;&lt;P&gt;So 323 Donald Duck with 01/06/2020 should stay as the one first specimen date and the rest should be removed because both have specimen dates less than 12 months difference from the first obs (03/06/2020 and 03/08/2020). * I noticed this happening when there is one or more obs with the same dob (same for Donald Duck) and missing names such as :&lt;/P&gt;&lt;P&gt;230 M . . &lt;STRONG&gt;02/12/1956&lt;/STRONG&gt; 03/09/2020&lt;BR /&gt;344 M . . &lt;STRONG&gt;02/12/1956&lt;/STRONG&gt; 04/09/2020&lt;/P&gt;&lt;P&gt;For Betty, you're right it should not show since it has a difference of less than one year from the previous obs. My mistake, sorry about that.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thank you,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;DIV class=""&gt;&lt;BR /&gt;&lt;DIV&gt;&lt;DIV align="center"&gt;Obs Id sex first_name last_name DOB specimen_Date1234567 &lt;TABLE cellspacing="0" cellpadding="5"&gt;&lt;TBODY&gt;&lt;TR&gt;&lt;TD&gt;322&lt;/TD&gt;&lt;TD&gt;F&lt;/TD&gt;&lt;TD&gt;Minnie&lt;/TD&gt;&lt;TD&gt;Mouse&lt;/TD&gt;&lt;TD&gt;02/12/1956&lt;/TD&gt;&lt;TD&gt;08/06/2019&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;167&lt;/TD&gt;&lt;TD&gt;F&lt;/TD&gt;&lt;TD&gt;Betty&lt;/TD&gt;&lt;TD&gt;Boop&lt;/TD&gt;&lt;TD&gt;09/01/1993&lt;/TD&gt;&lt;TD&gt;03/06/2019&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;333&lt;/TD&gt;&lt;TD&gt;F&lt;/TD&gt;&lt;TD&gt;Betty&lt;/TD&gt;&lt;TD&gt;Boop&lt;/TD&gt;&lt;TD&gt;09/01/1993&lt;/TD&gt;&lt;TD&gt;03/30/2020&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;230&lt;/TD&gt;&lt;TD&gt;M&lt;/TD&gt;&lt;TD&gt;&amp;nbsp;&lt;/TD&gt;&lt;TD&gt;&amp;nbsp;&lt;/TD&gt;&lt;TD&gt;02/12/1956&lt;/TD&gt;&lt;TD&gt;03/09/2020&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;123&lt;/TD&gt;&lt;TD&gt;M&lt;/TD&gt;&lt;TD&gt;Mickey&lt;/TD&gt;&lt;TD&gt;Mouse&lt;/TD&gt;&lt;TD&gt;01/08/1961&lt;/TD&gt;&lt;TD&gt;01/01/2018&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;123&lt;/TD&gt;&lt;TD&gt;M&lt;/TD&gt;&lt;TD&gt;Mickey&lt;/TD&gt;&lt;TD&gt;Mouse&lt;/TD&gt;&lt;TD&gt;01/08/1961&lt;/TD&gt;&lt;TD&gt;09/02/2020&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;325&lt;/TD&gt;&lt;TD&gt;M&lt;/TD&gt;&lt;TD&gt;Daffy&lt;/TD&gt;&lt;TD&gt;Duck&lt;/TD&gt;&lt;TD&gt;07/01/1993&lt;/TD&gt;&lt;TD&gt;05/06/2020&lt;/TD&gt;&lt;/TR&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;</description>
    <pubDate>Tue, 31 May 2022 20:30:06 GMT</pubDate>
    <dc:creator>mayasak</dc:creator>
    <dc:date>2022-05-31T20:30:06Z</dc:date>
    <item>
      <title>Help dedup dataset based on date condition</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Help-dedup-dataset-based-on-date-condition/m-p/815820#M321992</link>
      <description>&lt;P&gt;I have a data set (sample below). I need to remove duplicates with a condition on specimen_date. If specimen_date difference between observations for the same person was less than 12 months then remove, else keep.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;data have;&lt;BR /&gt;input Id sex $ first_name $ last_name $ DOB :mmddyy10. specimen_Date :mmddyy10.;&lt;BR /&gt;format DOB mmddyy10. specimen_date mmddyy10.;&lt;BR /&gt;datalines;&lt;BR /&gt;123 M Mickey Mouse 01/08/1961 01/01/2018&lt;BR /&gt;123 M Mickey Mouse 01/08/1961 09/02/2020&lt;BR /&gt;322 F Minnie Mouse 02/12/1956 08/06/2019&lt;BR /&gt;344 M Donald Duck 02/12/1956 03/06/2020&lt;BR /&gt;344 M Donald Duck 02/12/1956 03/08/2020&lt;BR /&gt;323 M Donald Duck 02/12/1956 01/06/2020&lt;BR /&gt;323 M Daffy Duck 07/01/1993 09/06/2020&lt;/P&gt;&lt;P&gt;325 M Daffy Duck 07/01/1993 05/06/2020&lt;BR /&gt;333 F Betty Boop 09/01/1993 03/30/2020&lt;BR /&gt;167 F Betty Boop 09/01/1993 03/06/2019&lt;BR /&gt;245 F Betty Boop 09/01/1993 04/30/2020&lt;BR /&gt;167 F Betty Boop 09/01/1993 11/03/2021&lt;BR /&gt;344 M . . 02/12/1956 03/09/2020&lt;/P&gt;&lt;P&gt;344 M . . 02/12/1956 04/09/2020&lt;BR /&gt;;;;;&lt;BR /&gt;run;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I used this code but I did not get the intended data:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;proc sort data=have out=have_sorted;&lt;BR /&gt;by sex dob first_name last_name specimen_date;&lt;BR /&gt;run;&lt;/P&gt;&lt;P&gt;data want (drop=_:);&lt;BR /&gt;&lt;BR /&gt;set have_sorted;&lt;BR /&gt;by sex dob first_name last_name;&lt;/P&gt;&lt;P&gt;retain _maxdate;&lt;BR /&gt;if first.dob then call missing(_maxdate);&lt;BR /&gt;if specimen_date-360&amp;lt;=_maxdate then delete;&lt;BR /&gt;_maxdate=specimen_date;&lt;BR /&gt;run;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;This is what I'm supposed to get but I'm not getting the bolded ones:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;123 M Mickey Mouse 01/08/1961 01/01/2018&lt;BR /&gt;123 M Mickey Mouse 01/08/1961 09/02/2020&lt;BR /&gt;322 F Minnie Mouse 02/12/1956 08/06/2019&lt;BR /&gt;&lt;STRONG&gt;344 M Donald Duck 02/12/1956 01/06/2020&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;325 M Daffy Duck 07/01/1993 05/06/2020&lt;BR /&gt;333 F Betty Boop 09/01/1993 09/31/2020&lt;BR /&gt;167 F Betty Boop 09/01/1993 03/06/2019&lt;BR /&gt;&lt;STRONG&gt;167 F Betty Boop 09/01/1993 11/03/2021&lt;/STRONG&gt;&lt;BR /&gt;344 M . . 02/12/1956 03/09/2020&lt;/P&gt;&lt;P&gt;Any help much appreciated.&lt;/P&gt;&lt;P&gt;Thank you&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 31 May 2022 12:49:24 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Help-dedup-dataset-based-on-date-condition/m-p/815820#M321992</guid>
      <dc:creator>mayasak</dc:creator>
      <dc:date>2022-05-31T12:49:24Z</dc:date>
    </item>
    <item>
      <title>Re: Help dedup dataset based on date condition</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Help-dedup-dataset-based-on-date-condition/m-p/815836#M321996</link>
      <description>&lt;P&gt;I would suggest adding put statements into the code, and reviewing the SAS log to follow the data flow/logic&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;344 M Donald Duck 02/12/1956 01/06/2020&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;The &lt;SPAN&gt;&lt;STRONG&gt;if specimen_date-360&amp;lt;=_maxdate then delete;&lt;/STRONG&gt;&amp;nbsp;condition is true for this record so it is deleted&lt;BR /&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;first_name=Donald last_name=Duck Id=323 DOB=02/12/1956 specimen_Date=01/06/2020&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;specimen_Date=21920 specimen_Date=01/06/2020&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;_maxdate&amp;nbsp; &amp;nbsp; &amp;nbsp;=21983 _maxdate&amp;nbsp; &amp;nbsp; &amp;nbsp;=09/03/2020&lt;/FONT&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;&lt;STRONG&gt;167 F Betty Boop 09/01/1993 11/03/2021&lt;/STRONG&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;This is in the &lt;STRONG&gt;want&lt;/STRONG&gt; dataset when I ran your code&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;Can you describe the logic for selecting/deleting the observations , as I'm not sure I really understand what you are attempting to achieve&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 31 May 2022 14:23:54 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Help-dedup-dataset-based-on-date-condition/m-p/815836#M321996</guid>
      <dc:creator>AMSAS</dc:creator>
      <dc:date>2022-05-31T14:23:54Z</dc:date>
    </item>
    <item>
      <title>Re: Help dedup dataset based on date condition</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Help-dedup-dataset-based-on-date-condition/m-p/815871#M322011</link>
      <description>&lt;P&gt;First a generic comment about this code:&lt;/P&gt;
&lt;P&gt;if specimen_date-360&amp;lt;=_maxdate then delete;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;360 days is not 12 months. So you need to decide do you want 12 months or 360 days for a comparison. The INTNX function is used to increment dates by intervals like month and year with the 'S' parameter to do actual calendar dates.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;IF you want 360 days you might look at the YRDIF function as it has two different 360 day "year" options&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;if yrdif(specimen_date, maxdate, 'ACT/360') &amp;gt;1 then delete;&lt;/P&gt;
&lt;P&gt;or&lt;/P&gt;
&lt;P&gt;if yrdif(specimen_date, maxdate, '30/360') &amp;gt;1 then delete;&lt;/P&gt;</description>
      <pubDate>Tue, 31 May 2022 15:25:16 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Help-dedup-dataset-based-on-date-condition/m-p/815871#M322011</guid>
      <dc:creator>ballardw</dc:creator>
      <dc:date>2022-05-31T15:25:16Z</dc:date>
    </item>
    <item>
      <title>Re: Help dedup dataset based on date condition</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Help-dedup-dataset-based-on-date-condition/m-p/815916#M322031</link>
      <description>&lt;P&gt;Hi AMSAS,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thank you first for your reply.&lt;/P&gt;&lt;P&gt;This is what I get when I run the code (below):&lt;/P&gt;&lt;P&gt;What I wanted exactly to keep the first observation for a person ( same dob, first_name, and last_name) then:&lt;/P&gt;&lt;P&gt;- remove all other obs with specimen_date less than the first (previous) specimen_date by 12 months (1 year).&amp;nbsp;&lt;/P&gt;&lt;P&gt;- keep the obs with specimen date different than the previous specimen date by more than 12 months (1 year).&amp;nbsp;&lt;/P&gt;&lt;P&gt;So 323 Donald Duck with 01/06/2020 should stay as the one first specimen date and the rest should be removed because both have specimen dates less than 12 months difference from the first obs (03/06/2020 and 03/08/2020). * I noticed this happening when there is one or more obs with the same dob (same for Donald Duck) and missing names such as :&lt;/P&gt;&lt;P&gt;230 M . . &lt;STRONG&gt;02/12/1956&lt;/STRONG&gt; 03/09/2020&lt;BR /&gt;344 M . . &lt;STRONG&gt;02/12/1956&lt;/STRONG&gt; 04/09/2020&lt;/P&gt;&lt;P&gt;For Betty, you're right it should not show since it has a difference of less than one year from the previous obs. My mistake, sorry about that.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thank you,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;DIV class=""&gt;&lt;BR /&gt;&lt;DIV&gt;&lt;DIV align="center"&gt;Obs Id sex first_name last_name DOB specimen_Date1234567 &lt;TABLE cellspacing="0" cellpadding="5"&gt;&lt;TBODY&gt;&lt;TR&gt;&lt;TD&gt;322&lt;/TD&gt;&lt;TD&gt;F&lt;/TD&gt;&lt;TD&gt;Minnie&lt;/TD&gt;&lt;TD&gt;Mouse&lt;/TD&gt;&lt;TD&gt;02/12/1956&lt;/TD&gt;&lt;TD&gt;08/06/2019&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;167&lt;/TD&gt;&lt;TD&gt;F&lt;/TD&gt;&lt;TD&gt;Betty&lt;/TD&gt;&lt;TD&gt;Boop&lt;/TD&gt;&lt;TD&gt;09/01/1993&lt;/TD&gt;&lt;TD&gt;03/06/2019&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;333&lt;/TD&gt;&lt;TD&gt;F&lt;/TD&gt;&lt;TD&gt;Betty&lt;/TD&gt;&lt;TD&gt;Boop&lt;/TD&gt;&lt;TD&gt;09/01/1993&lt;/TD&gt;&lt;TD&gt;03/30/2020&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;230&lt;/TD&gt;&lt;TD&gt;M&lt;/TD&gt;&lt;TD&gt;&amp;nbsp;&lt;/TD&gt;&lt;TD&gt;&amp;nbsp;&lt;/TD&gt;&lt;TD&gt;02/12/1956&lt;/TD&gt;&lt;TD&gt;03/09/2020&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;123&lt;/TD&gt;&lt;TD&gt;M&lt;/TD&gt;&lt;TD&gt;Mickey&lt;/TD&gt;&lt;TD&gt;Mouse&lt;/TD&gt;&lt;TD&gt;01/08/1961&lt;/TD&gt;&lt;TD&gt;01/01/2018&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;123&lt;/TD&gt;&lt;TD&gt;M&lt;/TD&gt;&lt;TD&gt;Mickey&lt;/TD&gt;&lt;TD&gt;Mouse&lt;/TD&gt;&lt;TD&gt;01/08/1961&lt;/TD&gt;&lt;TD&gt;09/02/2020&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;325&lt;/TD&gt;&lt;TD&gt;M&lt;/TD&gt;&lt;TD&gt;Daffy&lt;/TD&gt;&lt;TD&gt;Duck&lt;/TD&gt;&lt;TD&gt;07/01/1993&lt;/TD&gt;&lt;TD&gt;05/06/2020&lt;/TD&gt;&lt;/TR&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;</description>
      <pubDate>Tue, 31 May 2022 20:30:06 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Help-dedup-dataset-based-on-date-condition/m-p/815916#M322031</guid>
      <dc:creator>mayasak</dc:creator>
      <dc:date>2022-05-31T20:30:06Z</dc:date>
    </item>
  </channel>
</rss>

