<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: de-duplicating in SAS Procedures</title>
    <link>https://communities.sas.com/t5/SAS-Procedures/de-duplicating/m-p/256667#M57166</link>
    <description>&lt;P&gt;The simplest way is using proc means + idgroup option.&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data have;
input (ID                             LOINCS_1               RESULT_DATE) (: $40.);
cards;
123                           987                         01-SEPT-XXXX
123                           945                         07-DEC-XXXX
134                           945                         21-OCT-XXXX
186                           756                         23-JAN-XXXX
186                           654                         12-JAN-XXXX
234                           654                         14-FEB-XXXX
234                           987                         27-MAR-XXXX
;
run;

proc sql;
 select max(n) into : n
  from (select count(*) as n from have group by id)
quit;
proc summary data=have;
by id;
output out=want idgroup(out[&amp;amp;n] (LOINCS_1   RESULT_DATE)=);
quit;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;If you have big data, you could check MERGE skill:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;A href="http://support.sas.com/resources/papers/proceedings15/2785-2015.pdf" target="_blank"&gt;http://support.sas.com/resources/papers/proceedings15/2785-2015.pdf&lt;/A&gt;&lt;/P&gt;</description>
    <pubDate>Tue, 15 Mar 2016 01:11:33 GMT</pubDate>
    <dc:creator>Ksharp</dc:creator>
    <dc:date>2016-03-15T01:11:33Z</dc:date>
    <item>
      <title>de-duplicating</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/de-duplicating/m-p/256620#M57159</link>
      <description>&lt;P&gt;&lt;BR /&gt;I am working on a long form dataset with multiple lab tests (labeled LOINC1, LOINC 2…), result dates, etc. as shown in the snapshot.&lt;BR /&gt;LOINCS can be repeated for an observation, meaning, same tests might have been performed multiple times. A particular observation might have 5 repeated tests or all unique tests. Another one might have two different tests repeated 3 times each, etc.&lt;BR /&gt;If same test was performed within 3 months interval (for example), I want to de-duplicate the tests FOR THAT OBSERVATION. But if the result dates are 4 months apart for same tests, I want both to stay. Any suggestions to tackle this would be much appreciated!&lt;/P&gt;</description>
      <pubDate>Mon, 14 Mar 2016 18:36:48 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/de-duplicating/m-p/256620#M57159</guid>
      <dc:creator>TEJ</dc:creator>
      <dc:date>2016-03-14T18:36:48Z</dc:date>
    </item>
    <item>
      <title>Re: de-duplicating</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/de-duplicating/m-p/256621#M57160</link>
      <description>So the question areises: what is an observation? Can't see any clear primary key there?&lt;BR /&gt;Given that the data was stored normalized (transposed from wide to long) an with a key, the operation would be simpler.</description>
      <pubDate>Mon, 14 Mar 2016 18:57:14 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/de-duplicating/m-p/256621#M57160</guid>
      <dc:creator>LinusH</dc:creator>
      <dc:date>2016-03-14T18:57:14Z</dc:date>
    </item>
    <item>
      <title>Re: de-duplicating</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/de-duplicating/m-p/256622#M57161</link>
      <description>&lt;P&gt;Thank you LinusH! an "Observation" is a patient's unique ID. the data was transposed from long to wide, not wide to long. so, in it's orginal form, the data looked like..&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;ID&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; LOINCS_1 &amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; RESULT_DATE&lt;/P&gt;&lt;P&gt;123&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 987&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 01-SEPT-XXXX&lt;/P&gt;&lt;P&gt;123&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 945&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 07-DEC-XXXX&lt;/P&gt;&lt;P&gt;134&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 945&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 21-OCT-XXXX&lt;/P&gt;&lt;P&gt;186&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 756&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 23-JAN-XXXX&lt;/P&gt;&lt;P&gt;186&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 654&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 12-JAN-XXXX&lt;/P&gt;&lt;P&gt;234&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 654&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 14-FEB-XXXX&lt;/P&gt;&lt;P&gt;234&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 987&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 27-MAR-XXXX&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;data in this form was transposed to a wider format as in the earlier attachment.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Hope that clarifies..?&lt;/P&gt;</description>
      <pubDate>Mon, 14 Mar 2016 19:09:20 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/de-duplicating/m-p/256622#M57161</guid>
      <dc:creator>TEJ</dc:creator>
      <dc:date>2016-03-14T19:09:20Z</dc:date>
    </item>
    <item>
      <title>Re: de-duplicating</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/de-duplicating/m-p/256630#M57162</link>
      <description>&lt;P&gt;What was the rational for the original transformation?&lt;/P&gt;
&lt;P&gt;I think processing your rules in a long format would be easier?&lt;/P&gt;
&lt;P&gt;Sort by ID, test and date, thenu se the dif or lag function to get the time from last test to determine if it's a duplicate record.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 14 Mar 2016 19:49:16 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/de-duplicating/m-p/256630#M57162</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2016-03-14T19:49:16Z</dc:date>
    </item>
    <item>
      <title>Re: de-duplicating</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/de-duplicating/m-p/256643#M57164</link>
      <description>&lt;P&gt;For the analysis that I am doing, it was important for me to see records per ID (per patient), and I thought wider format would be useful. Specifically, I had to look at statistics like # of cases- "with ONLY test A and B", "Only test B", Test B and C", "Only B and C" etc., and it sounded easier for me to pull from wider table than long one.&lt;/P&gt;&lt;P&gt;If it works, I don't have problems going back to the long format (again.&amp;nbsp; But something i probably should have clarified before- the labs are NOT true duplicates. I want to treat like duplicates if it is the same test performed within certain duration (like 3 months).&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;how would you duplicate using the long format? any quick hints/sample codes? dif/lag fucntions?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;thank you&lt;/P&gt;</description>
      <pubDate>Mon, 14 Mar 2016 20:46:38 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/de-duplicating/m-p/256643#M57164</guid>
      <dc:creator>TEJ</dc:creator>
      <dc:date>2016-03-14T20:46:38Z</dc:date>
    </item>
    <item>
      <title>Re: de-duplicating</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/de-duplicating/m-p/256645#M57165</link>
      <description>&lt;OL&gt;
&lt;LI&gt;Sort Long data by ID/Test/Date&lt;/LI&gt;
&lt;LI&gt;Within each by group test for duplicates.&lt;/LI&gt;
&lt;/OL&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;/*1*/
proc sort data=have;
by ID test date;
run;

/*2*/
data want;
set have;
by id test;
retain count 0;

prev_test=lag(date);
if first.test then do;
prev_test=.;
count=0;
end;

if date-prev_date &amp;gt; 90 then count+1;

run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Duplicates will be grouped and you can take the first per group. It's untested but should get you started.&lt;/P&gt;</description>
      <pubDate>Mon, 14 Mar 2016 20:53:33 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/de-duplicating/m-p/256645#M57165</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2016-03-14T20:53:33Z</dc:date>
    </item>
    <item>
      <title>Re: de-duplicating</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/de-duplicating/m-p/256667#M57166</link>
      <description>&lt;P&gt;The simplest way is using proc means + idgroup option.&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data have;
input (ID                             LOINCS_1               RESULT_DATE) (: $40.);
cards;
123                           987                         01-SEPT-XXXX
123                           945                         07-DEC-XXXX
134                           945                         21-OCT-XXXX
186                           756                         23-JAN-XXXX
186                           654                         12-JAN-XXXX
234                           654                         14-FEB-XXXX
234                           987                         27-MAR-XXXX
;
run;

proc sql;
 select max(n) into : n
  from (select count(*) as n from have group by id)
quit;
proc summary data=have;
by id;
output out=want idgroup(out[&amp;amp;n] (LOINCS_1   RESULT_DATE)=);
quit;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;If you have big data, you could check MERGE skill:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;A href="http://support.sas.com/resources/papers/proceedings15/2785-2015.pdf" target="_blank"&gt;http://support.sas.com/resources/papers/proceedings15/2785-2015.pdf&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 15 Mar 2016 01:11:33 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/de-duplicating/m-p/256667#M57166</guid>
      <dc:creator>Ksharp</dc:creator>
      <dc:date>2016-03-15T01:11:33Z</dc:date>
    </item>
  </channel>
</rss>

