<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Detect Duplicate Observations by Different Grouping in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Detect-Duplicate-Observations-by-Different-Grouping/m-p/508357#M136527</link>
    <description>Completely unsorted.&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;</description>
    <pubDate>Mon, 29 Oct 2018 14:44:04 GMT</pubDate>
    <dc:creator>altijani</dc:creator>
    <dc:date>2018-10-29T14:44:04Z</dc:date>
    <item>
      <title>Detect Duplicate Observations by Different Grouping</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Detect-Duplicate-Observations-by-Different-Grouping/m-p/508309#M136509</link>
      <description>&lt;P&gt;Good Monday Community,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I HAVE the following customers data, which is not sorted (I am sorting it here just to get the concept across):&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;TABLE border="0" cellspacing="0" cellpadding="0"&gt;&lt;TBODY&gt;&lt;TR&gt;&lt;TD&gt;&lt;FONT color="#000000" face="Calibri" size="3"&gt;ID_1&lt;/FONT&gt;&lt;/TD&gt;&lt;TD&gt;&lt;FONT color="#000000" face="Calibri" size="3"&gt;ID_2&lt;/FONT&gt;&lt;/TD&gt;&lt;TD&gt;&lt;FONT color="#000000" face="Calibri" size="3"&gt;State&lt;/FONT&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;&lt;FONT color="#000000" face="Calibri" size="3"&gt;A123456&lt;/FONT&gt;&lt;/TD&gt;&lt;TD&gt;&lt;FONT color="#000000" face="Calibri" size="3"&gt;1ABCDE&lt;/FONT&gt;&lt;/TD&gt;&lt;TD&gt;&lt;FONT color="#000000" face="Calibri" size="3"&gt;FL&lt;/FONT&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;&lt;FONT color="#000000" face="Calibri" size="3"&gt;A123456&lt;/FONT&gt;&lt;/TD&gt;&lt;TD&gt;&lt;FONT color="#000000" face="Calibri" size="3"&gt;1ABCDE&lt;/FONT&gt;&lt;/TD&gt;&lt;TD&gt;&lt;FONT color="#000000" face="Calibri" size="3"&gt;GA&lt;/FONT&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;&lt;FONT color="#000000" face="Calibri" size="3"&gt;B984536&lt;/FONT&gt;&lt;/TD&gt;&lt;TD&gt;&lt;FONT color="#000000" face="Calibri" size="3"&gt;2HGJHW&lt;/FONT&gt;&lt;/TD&gt;&lt;TD&gt;&lt;FONT color="#000000" face="Calibri" size="3"&gt;NC&lt;/FONT&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;&lt;FONT color="#000000" face="Calibri" size="3"&gt;C567342&lt;/FONT&gt;&lt;/TD&gt;&lt;TD&gt;&lt;FONT color="#000000" face="Calibri" size="3"&gt;3KJEWG&lt;/FONT&gt;&lt;/TD&gt;&lt;TD&gt;&lt;FONT color="#000000" face="Calibri" size="3"&gt;IL&lt;/FONT&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;&lt;FONT color="#000000" face="Calibri" size="3"&gt;C567342&lt;/FONT&gt;&lt;/TD&gt;&lt;TD&gt;&lt;FONT color="#000000" face="Calibri" size="3"&gt;4UYMVS&lt;/FONT&gt;&lt;/TD&gt;&lt;TD&gt;&lt;FONT color="#000000" face="Calibri" size="3"&gt;IL&lt;/FONT&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;&lt;FONT color="#000000" face="Calibri" size="3"&gt;D723410&lt;/FONT&gt;&lt;/TD&gt;&lt;TD&gt;&lt;FONT color="#000000" face="Calibri" size="3"&gt;5PPOAD&lt;/FONT&gt;&lt;/TD&gt;&lt;TD&gt;&lt;FONT color="#000000" face="Calibri" size="3"&gt;WY&lt;/FONT&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;&lt;FONT color="#000000" face="Calibri" size="3"&gt;D723410&lt;/FONT&gt;&lt;/TD&gt;&lt;TD&gt;&lt;FONT color="#000000" face="Calibri" size="3"&gt;6NNTGC&lt;/FONT&gt;&lt;/TD&gt;&lt;TD&gt;&lt;FONT color="#000000" face="Calibri" size="3"&gt;AL&lt;/FONT&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I need to create a variable that identify the TYPE of duplicates in my data, so that I&amp;nbsp;can deal with them accordingly. As you can tell from the above, that there are four types of duplicates as in the WANT data below:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;TABLE&gt;&lt;TBODY&gt;&lt;TR&gt;&lt;TD&gt;ID_1&lt;/TD&gt;&lt;TD&gt;ID_2&lt;/TD&gt;&lt;TD&gt;State&lt;/TD&gt;&lt;TD&gt;Type&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;A123456&lt;/TD&gt;&lt;TD&gt;1ABCDE&lt;/TD&gt;&lt;TD&gt;FL&lt;/TD&gt;&lt;TD&gt;Same IDs, Different States&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;A123456&lt;/TD&gt;&lt;TD&gt;1ABCDE&lt;/TD&gt;&lt;TD&gt;GA&lt;/TD&gt;&lt;TD&gt;Same IDs, Different States&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;B984536&lt;/TD&gt;&lt;TD&gt;2HGJHW&lt;/TD&gt;&lt;TD&gt;NC&lt;/TD&gt;&lt;TD&gt;Unique&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;C567342&lt;/TD&gt;&lt;TD&gt;3KJEWG&lt;/TD&gt;&lt;TD&gt;IL&lt;/TD&gt;&lt;TD&gt;Same ID_1 and State, Different ID_2&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;C567342&lt;/TD&gt;&lt;TD&gt;4UYMVS&lt;/TD&gt;&lt;TD&gt;IL&lt;/TD&gt;&lt;TD&gt;Same ID_1 and State, Different ID_2&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;D723410&lt;/TD&gt;&lt;TD&gt;5PPOAD&lt;/TD&gt;&lt;TD&gt;WY&lt;/TD&gt;&lt;TD&gt;Same ID_1, Different ID_2 and State&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;D723410&lt;/TD&gt;&lt;TD&gt;6NNTGC&lt;/TD&gt;&lt;TD&gt;AL&lt;/TD&gt;&lt;TD&gt;Same ID_1, Different ID_2 and State&lt;/TD&gt;&lt;/TR&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I run my code in SAS EG using Proc SQL, and I appreciate any help&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Many thanks!&lt;/P&gt;</description>
      <pubDate>Mon, 29 Oct 2018 12:03:23 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Detect-Duplicate-Observations-by-Different-Grouping/m-p/508309#M136509</guid>
      <dc:creator>altijani</dc:creator>
      <dc:date>2018-10-29T12:03:23Z</dc:date>
    </item>
    <item>
      <title>Re: Detect Duplicate Observations by Different Grouping</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Detect-Duplicate-Observations-by-Different-Grouping/m-p/508310#M136510</link>
      <description>&lt;P&gt;This code is untested, please in future&amp;nbsp;&lt;U&gt;&lt;STRONG&gt;post test data in the form of a datastep!&lt;/STRONG&gt;&lt;/U&gt;&lt;/P&gt;
&lt;PRE&gt;proc sql;
  create table want as
  select a.*,
  case   when exists(select id_1 from have where id_1=a.id_1 and state ne a.state) &lt;BR /&gt;           then "Same IDs, Different States"&lt;BR /&gt;         when exists(select id_1 from have where id_1=a.id_1 and state=a._state and id_2 ne a.id_2) &lt;BR /&gt;           then "Same ID_1 and state, different ID_2"&lt;BR /&gt;         /* Other checks here */&lt;BR /&gt;         else "Unique" end as type&lt;BR /&gt;  from   have a;&lt;BR /&gt;quit;&lt;/PRE&gt;</description>
      <pubDate>Mon, 29 Oct 2018 12:04:57 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Detect-Duplicate-Observations-by-Different-Grouping/m-p/508310#M136510</guid>
      <dc:creator>RW9</dc:creator>
      <dc:date>2018-10-29T12:04:57Z</dc:date>
    </item>
    <item>
      <title>Re: Detect Duplicate Observations by Different Grouping</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Detect-Duplicate-Observations-by-Different-Grouping/m-p/508348#M136523</link>
      <description>&lt;P&gt;Note that you can get a detailed compilation of the categories you want in this way:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;proc freq data=have;&lt;/P&gt;
&lt;P&gt;tables ID_1 * ID_2 * state / noprint out=want (where=(count &amp;gt; 1) drop=percent);&lt;/P&gt;
&lt;P&gt;run;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;proc print data=want;&lt;/P&gt;
&lt;P&gt;run;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Also note that your original data may contain differences that belong in more than one category.&amp;nbsp; Here is an example of what you could encounter:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;TABLE border="0" cellspacing="0" cellpadding="0"&gt;
&lt;TBODY&gt;
&lt;TR&gt;
&lt;TD&gt;&lt;FONT face="Calibri" size="3" color="#000000"&gt;ID_1&lt;/FONT&gt;&lt;/TD&gt;
&lt;TD&gt;&lt;FONT face="Calibri" size="3" color="#000000"&gt;ID_2&lt;/FONT&gt;&lt;/TD&gt;
&lt;TD&gt;&lt;FONT face="Calibri" size="3" color="#000000"&gt;State&lt;/FONT&gt;&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;&lt;FONT face="Calibri" size="3" color="#000000"&gt;A123456&lt;/FONT&gt;&lt;/TD&gt;
&lt;TD&gt;&lt;FONT face="Calibri" size="3" color="#000000"&gt;1ABCDE&lt;/FONT&gt;&lt;/TD&gt;
&lt;TD&gt;&lt;FONT face="Calibri" size="3" color="#000000"&gt;FL&lt;/FONT&gt;&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;&lt;FONT face="Calibri" size="3" color="#000000"&gt;A123456&lt;/FONT&gt;&lt;/TD&gt;
&lt;TD&gt;&lt;FONT face="Calibri" size="3" color="#000000"&gt;1ABCDE&lt;/FONT&gt;&lt;/TD&gt;
&lt;TD&gt;&lt;FONT face="Calibri" size="3" color="#000000"&gt;GA&lt;/FONT&gt;&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;&lt;FONT face="Calibri" size="3" color="#000000"&gt;B984536&lt;/FONT&gt;&lt;/TD&gt;
&lt;TD&gt;&lt;FONT face="Calibri" size="3" color="#000000"&gt;1ABCDE&lt;/FONT&gt;&lt;/TD&gt;
&lt;TD&gt;&lt;FONT face="Calibri" size="3" color="#000000"&gt;GA&lt;/FONT&gt;&lt;/TD&gt;
&lt;/TR&gt;
&lt;/TBODY&gt;
&lt;/TABLE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;So how do you categorize the second observation?&amp;nbsp; It resembles both the first and the third, but with different variables changing.&lt;/P&gt;</description>
      <pubDate>Mon, 29 Oct 2018 14:44:06 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Detect-Duplicate-Observations-by-Different-Grouping/m-p/508348#M136523</guid>
      <dc:creator>Astounding</dc:creator>
      <dc:date>2018-10-29T14:44:06Z</dc:date>
    </item>
    <item>
      <title>Re: Detect Duplicate Observations by Different Grouping</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Detect-Duplicate-Observations-by-Different-Grouping/m-p/508356#M136526</link>
      <description>&lt;P&gt;Is the data completely unsorted?&amp;nbsp;&amp;nbsp; Or is it sorted by ID1, but not sorted within each ID1 group?&lt;/P&gt;</description>
      <pubDate>Mon, 29 Oct 2018 14:39:07 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Detect-Duplicate-Observations-by-Different-Grouping/m-p/508356#M136526</guid>
      <dc:creator>mkeintz</dc:creator>
      <dc:date>2018-10-29T14:39:07Z</dc:date>
    </item>
    <item>
      <title>Re: Detect Duplicate Observations by Different Grouping</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Detect-Duplicate-Observations-by-Different-Grouping/m-p/508357#M136527</link>
      <description>Completely unsorted.&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;</description>
      <pubDate>Mon, 29 Oct 2018 14:44:04 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Detect-Duplicate-Observations-by-Different-Grouping/m-p/508357#M136527</guid>
      <dc:creator>altijani</dc:creator>
      <dc:date>2018-10-29T14:44:04Z</dc:date>
    </item>
    <item>
      <title>Re: Detect Duplicate Observations by Different Grouping</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Detect-Duplicate-Observations-by-Different-Grouping/m-p/508365#M136531</link>
      <description>&lt;P&gt;Thanks for your proc freq and proc print.&lt;/P&gt;&lt;P&gt;Eventually, this is what I am after: to know the breakdown of each group. My data contains multiple million records, and I am running into:&lt;/P&gt;&lt;P&gt;ERROR: The SAS System stopped processing this step because of insufficient memory.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;As for your other question/suggestion, it is not possible to have the same ID_2 for two different ID_1. The data is cleaner in that regard.&lt;/P&gt;</description>
      <pubDate>Mon, 29 Oct 2018 15:01:11 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Detect-Duplicate-Observations-by-Different-Grouping/m-p/508365#M136531</guid>
      <dc:creator>altijani</dc:creator>
      <dc:date>2018-10-29T15:01:11Z</dc:date>
    </item>
    <item>
      <title>Re: Detect Duplicate Observations by Different Grouping</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Detect-Duplicate-Observations-by-Different-Grouping/m-p/508372#M136535</link>
      <description>&lt;P&gt;is your sample a good representative of your real meaning you have exactly the same number of columns i.e id1, id2 and state? Can you confirm&lt;/P&gt;</description>
      <pubDate>Mon, 29 Oct 2018 15:08:09 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Detect-Duplicate-Observations-by-Different-Grouping/m-p/508372#M136535</guid>
      <dc:creator>novinosrin</dc:creator>
      <dc:date>2018-10-29T15:08:09Z</dc:date>
    </item>
    <item>
      <title>Re: Detect Duplicate Observations by Different Grouping</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Detect-Duplicate-Observations-by-Different-Grouping/m-p/508376#M136537</link>
      <description>&lt;P&gt;OK, so the same ID_2 cannot appear under different ID_1 values.&amp;nbsp; That still doesn't eliminate the problem:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;TABLE border="0" cellspacing="0" cellpadding="0"&gt;
&lt;TBODY&gt;
&lt;TR&gt;
&lt;TD&gt;&lt;FONT face="Calibri" size="3" color="#000000"&gt;ID_1&lt;/FONT&gt;&lt;/TD&gt;
&lt;TD&gt;&lt;FONT face="Calibri" size="3" color="#000000"&gt;ID_2&lt;/FONT&gt;&lt;/TD&gt;
&lt;TD&gt;&lt;FONT face="Calibri" size="3" color="#000000"&gt;State&lt;/FONT&gt;&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;&lt;FONT face="Calibri" size="3" color="#000000"&gt;A123456&lt;/FONT&gt;&lt;/TD&gt;
&lt;TD&gt;&lt;FONT face="Calibri" size="3" color="#000000"&gt;1ABCDE&lt;/FONT&gt;&lt;/TD&gt;
&lt;TD&gt;&lt;FONT face="Calibri" size="3" color="#000000"&gt;FL&lt;/FONT&gt;&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;&lt;FONT face="Calibri" size="3" color="#000000"&gt;A123456&lt;/FONT&gt;&lt;/TD&gt;
&lt;TD&gt;&lt;FONT face="Calibri" size="3" color="#000000"&gt;1ABCDE&lt;/FONT&gt;&lt;/TD&gt;
&lt;TD&gt;&lt;FONT face="Calibri" size="3" color="#000000"&gt;GA&lt;/FONT&gt;&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;&lt;FONT face="Calibri" size="3" color="#000000"&gt;A123456&lt;/FONT&gt;&lt;/TD&gt;
&lt;TD&gt;&lt;FONT face="Calibri" size="3" color="#000000"&gt;2ABCDE&lt;/FONT&gt;&lt;/TD&gt;
&lt;TD&gt;&lt;FONT face="Calibri" size="3" color="#000000"&gt;GA&lt;/FONT&gt;&lt;/TD&gt;
&lt;/TR&gt;
&lt;/TBODY&gt;
&lt;/TABLE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Now the second observation belongs in two different categories, depending on whether you compare to observation 1 or observation 3.&lt;/P&gt;</description>
      <pubDate>Mon, 29 Oct 2018 15:14:32 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Detect-Duplicate-Observations-by-Different-Grouping/m-p/508376#M136537</guid>
      <dc:creator>Astounding</dc:creator>
      <dc:date>2018-10-29T15:14:32Z</dc:date>
    </item>
    <item>
      <title>Re: Detect Duplicate Observations by Different Grouping</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Detect-Duplicate-Observations-by-Different-Grouping/m-p/508377#M136538</link>
      <description>&lt;P&gt;I have multiple columns in the HAVE data (let us say Col1-Col10), but in the WANT data I only need these columns in order to run a breakdown of the groups: ID_1, ID_2, State, and the new "TYPE" column.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks.&lt;/P&gt;</description>
      <pubDate>Mon, 29 Oct 2018 15:15:30 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Detect-Duplicate-Observations-by-Different-Grouping/m-p/508377#M136538</guid>
      <dc:creator>altijani</dc:creator>
      <dc:date>2018-10-29T15:15:30Z</dc:date>
    </item>
    <item>
      <title>Re: Detect Duplicate Observations by Different Grouping</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Detect-Duplicate-Observations-by-Different-Grouping/m-p/508378#M136539</link>
      <description>&lt;P&gt;You are right: theoretically, that is a possibility that can exist in the data.&lt;/P&gt;</description>
      <pubDate>Mon, 29 Oct 2018 15:19:00 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Detect-Duplicate-Observations-by-Different-Grouping/m-p/508378#M136539</guid>
      <dc:creator>altijani</dc:creator>
      <dc:date>2018-10-29T15:19:00Z</dc:date>
    </item>
    <item>
      <title>Re: Detect Duplicate Observations by Different Grouping</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Detect-Duplicate-Observations-by-Different-Grouping/m-p/508396#M136543</link>
      <description>&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data have;
input (ID_1	ID_2	State) (:$20.);
cards;
A123456	1ABCDE	FL
A123456	1ABCDE	GA
B984536	2HGJHW	NC
C567342	3KJEWG	IL
C567342	4UYMVS	IL
D723410	5PPOAD	WY
D723410	6NNTGC	AL
;
proc sql;
create table want as
select id_1, id_2,state ,ifc(count( distinct state)&amp;gt;1, 'Same IDs, Different States',' ' ) as Type
from have
group by id_1,id_2
having type is not missing
union 
select id_1,id_2, state ,ifc(count( distinct id_2)&amp;gt;1,'Same ID_1 and State, Different ID_2',' ') as Type
from have
group by id_1,state
having type is not missing
union 
select id_1,id_2,state,ifc(count( distinct id_2)&amp;gt;1 and count( distinct state)&amp;gt;1,'Same ID_1, Different ID_2 and State',' ') as Type
from have
group by id_1
having type is not missing
union
select *, ifc(count( id_1)=1,'Unique',' ') as Type
from have
group by id_1
having type is not missing;
quit;&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Mon, 29 Oct 2018 16:20:03 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Detect-Duplicate-Observations-by-Different-Grouping/m-p/508396#M136543</guid>
      <dc:creator>novinosrin</dc:creator>
      <dc:date>2018-10-29T16:20:03Z</dc:date>
    </item>
    <item>
      <title>Re: Detect Duplicate Observations by Different Grouping</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Detect-Duplicate-Observations-by-Different-Grouping/m-p/508463#M136560</link>
      <description>&lt;P&gt;This solution requires 1 sort and 1 data step:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data have;
input (ID_1	ID_2	State) (:$20.);
cards;
A123456	1ABCDE	FL
A123456	1ABCDE	GA
B984536	2HGJHW	NC
C567342	3KJEWG	IL
C567342	4UYMVS	IL
D723410	5PPOAD	WY
D723410	6NNTGC	AL
;

proc sort data=have out=need;
  by ID_1 ID_2 state;
run;

data want (drop=_:);
  retain _ONE 1;
  do until (last.id_1);
    set need;
    by id_1 id_2;
    if _n_=1 and first.id_1 then do;
      declare hash h (suminc:'_one');
        h.definekey('state');
        h.definedone();
    end;
    if first.id_1=1 and last.id_1=1 then continue;  /*treated in later do loop*/
    else if first.id_2=0 or last.id_2=0 then continue; /*treated in later do loop*/
    else if h.find()^=0 then h.add();  /* Add the _ONE value for each instance */
  end;
  length matchtype $20;
  do until (last.id_1);
    set need;
    by id_1 id_2;
    if first.id_1=1 and last.id_1=1 then matchtype='Unique ID_1';
    else if first.id_2=0 or last.id_2=0 then matchtype='Different State';
    else do;
      h.sum(sum:_freq);
      if _freq&amp;gt;1 then matchtype='Different ID_2';
      else matchtype='Diff ID_2 and State';
    end;
    output;
  end;
  h.clear();
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;The sort by ID_1 ID_2 allows for straightforward identification of the completely "unique ID_1" and the "different state" matchtypes&amp;nbsp;in the data step.&amp;nbsp; Hence the "continue" for these conditions in the first DO loop.&lt;/LI&gt;
&lt;LI&gt;All other observations are recorded in the hash object H, keyed only on STATE.&amp;nbsp; &lt;BR /&gt;This hash is declared with the SUMINC:'_ONE' argument, which maintains a sum for each key.&amp;nbsp; Specifically&amp;nbsp;it adds the value of variable _ONE (which need not always=1)&amp;nbsp; for the first ADD method, and all subsequent FIND methods for each key.&amp;nbsp; I.e. it adds the value for every successful ADD and FIND method.&lt;BR /&gt;&lt;BR /&gt;This summary is retrieved into variable _FREQ in the later use of the .SUM method.&lt;/LI&gt;
&lt;LI&gt;In the second loop
&lt;OL&gt;
&lt;LI&gt;The easy-to-determine "unique ID_1" and "different state" records are identified&lt;/LI&gt;
&lt;LI&gt;For other records the hash object for that value of STATE is examined to get the frequency of access to each STATE.&amp;nbsp; If _freq&amp;gt;2 then match type is "different ID_2", otherwise "different ID_2 and state".&lt;/LI&gt;
&lt;/OL&gt;
&lt;/LI&gt;
&lt;/OL&gt;</description>
      <pubDate>Mon, 29 Oct 2018 18:58:15 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Detect-Duplicate-Observations-by-Different-Grouping/m-p/508463#M136560</guid>
      <dc:creator>mkeintz</dc:creator>
      <dc:date>2018-10-29T18:58:15Z</dc:date>
    </item>
    <item>
      <title>Re: Detect Duplicate Observations by Different Grouping</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Detect-Duplicate-Observations-by-Different-Grouping/m-p/508868#M136704</link>
      <description>&lt;P&gt;Thank you for the suggested code. It works perfectly in the mock data. When I use my data, which I can not share of course, the number of observations increased. Not sure why.&lt;/P&gt;&lt;P&gt;Any suggestions?&lt;/P&gt;&lt;P&gt;Thanks&lt;/P&gt;</description>
      <pubDate>Tue, 30 Oct 2018 18:34:01 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Detect-Duplicate-Observations-by-Different-Grouping/m-p/508868#M136704</guid>
      <dc:creator>altijani</dc:creator>
      <dc:date>2018-10-30T18:34:01Z</dc:date>
    </item>
    <item>
      <title>Re: Detect Duplicate Observations by Different Grouping</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Detect-Duplicate-Observations-by-Different-Grouping/m-p/508870#M136705</link>
      <description>&lt;P&gt;Can you make a better sample of mock data that's a better representative so I can test again. Also, you could use&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/31461"&gt;@mkeintz&lt;/a&gt;&amp;nbsp;'s code who has also contributed in the thread.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 30 Oct 2018 18:35:39 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Detect-Duplicate-Observations-by-Different-Grouping/m-p/508870#M136705</guid>
      <dc:creator>novinosrin</dc:creator>
      <dc:date>2018-10-30T18:35:39Z</dc:date>
    </item>
    <item>
      <title>Re: Detect Duplicate Observations by Different Grouping</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Detect-Duplicate-Observations-by-Different-Grouping/m-p/508876#M136710</link>
      <description>&lt;P&gt;Thank you for your code. It works perfectly using the mock data. When I ran in my data, which I can not share of course, there are only two TYPES: 'Unique' and &lt;SPAN class="token string"&gt;'Different State'. The second TYPE is an error in almost all of the places picked. In these places, it shouldn't be 'Unique", but not 'Different State' either.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN class="token string"&gt;Any suggestions?&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN class="token string"&gt;Many thanks&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 30 Oct 2018 18:47:49 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Detect-Duplicate-Observations-by-Different-Grouping/m-p/508876#M136710</guid>
      <dc:creator>altijani</dc:creator>
      <dc:date>2018-10-30T18:47:49Z</dc:date>
    </item>
  </channel>
</rss>

