<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Dedupe a dataset without sorting codes in SAS Enterprise Guide</title>
    <link>https://communities.sas.com/t5/SAS-Enterprise-Guide/Dedupe-a-dataset-without-sorting-codes/m-p/867468#M42669</link>
    <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/438962"&gt;@shubham_d&lt;/a&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I think this gets you most of the way there. Could you expand on the requirements for the custom sort a bit more? You're only applying a sort on part of the data&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;
*Read data in;

data have;
input id name $ score code $;
datalines;
1 John 23 A05
1 John 26 A02
1 John 26 A03
1 John 26 A01
2 Mark 21 A05
2 Mark 22 A05
2 Mark 23 A07

;
run;

*Create key variable;
data want;
set have;
key = name || code;
run;

*Sort by key;
proc sort data=want ; by key;run;

*Only retain first key (i.e. highest score);
data want;
set want;
by key;
if first.key then output;
drop key;
run;

*Sort output data;
proc sort data=want ; by name descending  score;run;
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Thanks&lt;/P&gt;
&lt;P&gt;Harry&lt;/P&gt;</description>
    <pubDate>Fri, 31 Mar 2023 14:46:16 GMT</pubDate>
    <dc:creator>HarrySnart</dc:creator>
    <dc:date>2023-03-31T14:46:16Z</dc:date>
    <item>
      <title>Dedupe a dataset without sorting codes</title>
      <link>https://communities.sas.com/t5/SAS-Enterprise-Guide/Dedupe-a-dataset-without-sorting-codes/m-p/867444#M42667</link>
      <description>&lt;P&gt;I want to dedupe the duplicate codes based on their Score, the highest score stays followed by sorting high to low of score. But, if the score is the same &amp;amp; reason codes are different, I want to maintain the order of reason codes. The dedupe only happens across one id, ID - 1 would *not* look for duplicates in ID - 2&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Example -&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;HAVE -&amp;nbsp;&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;data have;&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;input id name $ score code $;&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;datalines;&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;1 John 23 A05&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;1 John 26 A02&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;1 John 26 A03&lt;BR /&gt;&lt;SPAN&gt;1 John 26 A01&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;2 Mark 21 A05&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;2 Mark 22 A05&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;2 Mark 23 A07&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;;&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;run;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;TABLE border="1"&gt;&lt;TBODY&gt;&lt;TR&gt;&lt;TD&gt;ID&lt;/TD&gt;&lt;TD&gt;NAME&lt;/TD&gt;&lt;TD&gt;SCORE&lt;/TD&gt;&lt;TD&gt;CODE&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;1&lt;/TD&gt;&lt;TD&gt;&lt;SPAN&gt;John&lt;/SPAN&gt;&lt;/TD&gt;&lt;TD&gt;23&lt;/TD&gt;&lt;TD&gt;A05&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;1&lt;/TD&gt;&lt;TD&gt;&lt;SPAN&gt;John&lt;/SPAN&gt;&lt;/TD&gt;&lt;TD&gt;26&lt;/TD&gt;&lt;TD&gt;A02&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;1&lt;/TD&gt;&lt;TD&gt;&lt;SPAN&gt;John&lt;/SPAN&gt;&lt;/TD&gt;&lt;TD&gt;26&lt;/TD&gt;&lt;TD&gt;A03&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;1&lt;/TD&gt;&lt;TD&gt;&lt;SPAN&gt;John&lt;/SPAN&gt;&lt;/TD&gt;&lt;TD&gt;26&lt;/TD&gt;&lt;TD&gt;A01&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;2&lt;/TD&gt;&lt;TD&gt;&lt;SPAN&gt;Mark&lt;/SPAN&gt;&lt;/TD&gt;&lt;TD&gt;21&lt;/TD&gt;&lt;TD&gt;A05&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;2&lt;/TD&gt;&lt;TD&gt;Mark&lt;/TD&gt;&lt;TD&gt;22&lt;/TD&gt;&lt;TD&gt;A05&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;2&lt;/TD&gt;&lt;TD&gt;Mark&lt;/TD&gt;&lt;TD&gt;23&lt;/TD&gt;&lt;TD&gt;A07&lt;/TD&gt;&lt;/TR&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;WANT -&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;TABLE border="1"&gt;&lt;TBODY&gt;&lt;TR&gt;&lt;TD&gt;ID&lt;/TD&gt;&lt;TD&gt;NAME&lt;/TD&gt;&lt;TD&gt;SCORE&lt;/TD&gt;&lt;TD&gt;CODE&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;1&lt;/TD&gt;&lt;TD&gt;John&lt;/TD&gt;&lt;TD&gt;26&lt;/TD&gt;&lt;TD&gt;A02 [ Same Order as HAVE ]&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;1&amp;nbsp;&lt;/TD&gt;&lt;TD&gt;John&lt;/TD&gt;&lt;TD&gt;26&lt;/TD&gt;&lt;TD&gt;A03&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;1&lt;/TD&gt;&lt;TD&gt;John&lt;/TD&gt;&lt;TD&gt;26&lt;/TD&gt;&lt;TD&gt;A01&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;1&lt;/TD&gt;&lt;TD&gt;John&lt;/TD&gt;&lt;TD&gt;23&lt;/TD&gt;&lt;TD&gt;A05&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;2&lt;/TD&gt;&lt;TD&gt;Mark&lt;/TD&gt;&lt;TD&gt;23&lt;/TD&gt;&lt;TD&gt;A07&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;2&lt;/TD&gt;&lt;TD&gt;Mark&lt;/TD&gt;&lt;TD&gt;22&lt;/TD&gt;&lt;TD&gt;A05&lt;/TD&gt;&lt;/TR&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 31 Mar 2023 13:23:02 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Enterprise-Guide/Dedupe-a-dataset-without-sorting-codes/m-p/867444#M42667</guid>
      <dc:creator>shubham_d</dc:creator>
      <dc:date>2023-03-31T13:23:02Z</dc:date>
    </item>
    <item>
      <title>Re: Dedupe a dataset without sorting codes</title>
      <link>https://communities.sas.com/t5/SAS-Enterprise-Guide/Dedupe-a-dataset-without-sorting-codes/m-p/867464#M42668</link>
      <description>&lt;P&gt;Not sure I get exactly what you want to do.&lt;/P&gt;
&lt;P&gt;But if sort order is crucial for you application, you should have a unique (row?) id column in your data set.&lt;/P&gt;
&lt;P&gt;Then do de-duplication, and then to be sure sort it by the earlier assigned id.&lt;/P&gt;</description>
      <pubDate>Fri, 31 Mar 2023 14:38:59 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Enterprise-Guide/Dedupe-a-dataset-without-sorting-codes/m-p/867464#M42668</guid>
      <dc:creator>LinusH</dc:creator>
      <dc:date>2023-03-31T14:38:59Z</dc:date>
    </item>
    <item>
      <title>Re: Dedupe a dataset without sorting codes</title>
      <link>https://communities.sas.com/t5/SAS-Enterprise-Guide/Dedupe-a-dataset-without-sorting-codes/m-p/867468#M42669</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/438962"&gt;@shubham_d&lt;/a&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I think this gets you most of the way there. Could you expand on the requirements for the custom sort a bit more? You're only applying a sort on part of the data&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;
*Read data in;

data have;
input id name $ score code $;
datalines;
1 John 23 A05
1 John 26 A02
1 John 26 A03
1 John 26 A01
2 Mark 21 A05
2 Mark 22 A05
2 Mark 23 A07

;
run;

*Create key variable;
data want;
set have;
key = name || code;
run;

*Sort by key;
proc sort data=want ; by key;run;

*Only retain first key (i.e. highest score);
data want;
set want;
by key;
if first.key then output;
drop key;
run;

*Sort output data;
proc sort data=want ; by name descending  score;run;
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Thanks&lt;/P&gt;
&lt;P&gt;Harry&lt;/P&gt;</description>
      <pubDate>Fri, 31 Mar 2023 14:46:16 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Enterprise-Guide/Dedupe-a-dataset-without-sorting-codes/m-p/867468#M42669</guid>
      <dc:creator>HarrySnart</dc:creator>
      <dc:date>2023-03-31T14:46:16Z</dc:date>
    </item>
    <item>
      <title>Re: Dedupe a dataset without sorting codes</title>
      <link>https://communities.sas.com/t5/SAS-Enterprise-Guide/Dedupe-a-dataset-without-sorting-codes/m-p/867477#M42670</link>
      <description>&lt;P&gt;Hello&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/438962"&gt;@shubham_d&lt;/a&gt;,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Here's another suggestion: First, sort by ID and descending score, then remove duplicates.&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;proc sort data=have out=temp;
by id descending score;
run;

data want(drop=_:);
array _c[999] $; /* increase dimension if &amp;gt;999 obs. per ID */
do _i=1 by 1 until(last.id);
  set temp;
  by id;
  if code ~in _c then do;
    output;
    _c[_i]=code;  
  end;
end;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;If performance is an issue, you may want to use a &lt;FONT face="courier new,courier"&gt;_temporary_&lt;/FONT&gt; array or a hash object or at least adapt dimension and variable length of array &lt;FONT face="courier new,courier"&gt;_c&lt;/FONT&gt;.&lt;/P&gt;</description>
      <pubDate>Fri, 31 Mar 2023 15:08:35 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Enterprise-Guide/Dedupe-a-dataset-without-sorting-codes/m-p/867477#M42670</guid>
      <dc:creator>FreelanceReinh</dc:creator>
      <dc:date>2023-03-31T15:08:35Z</dc:date>
    </item>
    <item>
      <title>Re: Dedupe a dataset without sorting codes</title>
      <link>https://communities.sas.com/t5/SAS-Enterprise-Guide/Dedupe-a-dataset-without-sorting-codes/m-p/867483#M42671</link>
      <description>&lt;P&gt;I think I'm understanding your logic.&amp;nbsp; You're deduplicating by ID - Code, right?&amp;nbsp; And then you want to just sort the data?&amp;nbsp; Agree with&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/13674"&gt;@LinusH&lt;/a&gt;&amp;nbsp;about adding a temporary row variable to help with restoring the sort order.&amp;nbsp;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data have;
input id name $ score code $;
datalines;
1 John 23 A05
1 John 26 A02
1 John 26 A03
1 John 26 A01
2 Mark 21 A05
2 Mark 22 A05
2 Mark 23 A07
;
run;

*add a row counter, so can restore sort order at the end ;
data want1 ;
  set have ;
  row++1 ;
run ;

*Deduplicate by id-code.  ;
*Select the code with the maximum score ;

proc sort data=want1 ;
  by id code descending score ;
run ;

data want ;
  set want1 ;
  by id code descending score ;
  if first.code ;
run ;
&lt;BR /&gt;*sort to desired order;&lt;BR /&gt;
proc sort data=want ;
  by id descending score row ;
run ;

proc print data=want;
run ;&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Fri, 31 Mar 2023 15:32:21 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Enterprise-Guide/Dedupe-a-dataset-without-sorting-codes/m-p/867483#M42671</guid>
      <dc:creator>Quentin</dc:creator>
      <dc:date>2023-03-31T15:32:21Z</dc:date>
    </item>
    <item>
      <title>Re: Dedupe a dataset without sorting codes</title>
      <link>https://communities.sas.com/t5/SAS-Enterprise-Guide/Dedupe-a-dataset-without-sorting-codes/m-p/867485#M42673</link>
      <description>&lt;P&gt;If the original order is important then add a variables to indicate the original order.&lt;/P&gt;
&lt;P&gt;You could do it when you create the dataset.&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data have;
  input id name $ score code $;
  row+1;
datalines;
1 John 23 A05
1 John 26 A02
1 John 26 A03
1 John 26 A01
2 Mark 21 A05
2 Mark 22 A05
2 Mark 23 A07
;
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Or add it later.&amp;nbsp; Perhaps as a VIEW?&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data have_order / view=have_order ;
  set have;
  row+1;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Now sort by ID NAME CODE and descending SCORE.&amp;nbsp; Then you can de-dupe per ID NAME CODE.&amp;nbsp; Then you can re-sort by the ID NAME descending SCORE ROW to get the codes in the original order.&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;proc sort data=have out=step1 ;
  by id name code descending score ;
run;
proc sort data=step1 out=step2 nodupkey;
  by id name code ;
run;
proc sort data=step2 out=want;
  by id name descending score row ;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;PRE&gt;Obs    id    name    score    code    row

 1      1    John      26     A02      2
 2      1    John      26     A03      3
 3      1    John      26     A01      4
 4      1    John      23     A05      1
 5      2    Mark      23     A07      7
 6      2    Mark      22     A05      6
&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 31 Mar 2023 15:37:36 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Enterprise-Guide/Dedupe-a-dataset-without-sorting-codes/m-p/867485#M42673</guid>
      <dc:creator>Tom</dc:creator>
      <dc:date>2023-03-31T15:37:36Z</dc:date>
    </item>
    <item>
      <title>Re: Dedupe a dataset without sorting codes</title>
      <link>https://communities.sas.com/t5/SAS-Enterprise-Guide/Dedupe-a-dataset-without-sorting-codes/m-p/867531#M42681</link>
      <description>Hey Harry! Thanks for taking out the time to resolve this.&lt;BR /&gt;&lt;BR /&gt;The original order is important here when it comes to sorting an id that has the same score but a different code&lt;BR /&gt;&lt;BR /&gt;Yes, your code almost got me there, but for better readability instead of using key, I am now using Obs/row number to maintain the original order</description>
      <pubDate>Fri, 31 Mar 2023 18:02:49 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Enterprise-Guide/Dedupe-a-dataset-without-sorting-codes/m-p/867531#M42681</guid>
      <dc:creator>shubham_d</dc:creator>
      <dc:date>2023-03-31T18:02:49Z</dc:date>
    </item>
    <item>
      <title>Re: Dedupe a dataset without sorting codes</title>
      <link>https://communities.sas.com/t5/SAS-Enterprise-Guide/Dedupe-a-dataset-without-sorting-codes/m-p/867532#M42682</link>
      <description>Hey Quentin! Thanks for taking time out to resolve this. Yes, I want to deduplicate the redundant data which have same codes but a less score.&lt;BR /&gt;&lt;BR /&gt;Maintaining a row number helped! Thanks a lot &lt;span class="lia-unicode-emoji" title=":grinning_face_with_smiling_eyes:"&gt;😄&lt;/span&gt;</description>
      <pubDate>Fri, 31 Mar 2023 18:06:51 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Enterprise-Guide/Dedupe-a-dataset-without-sorting-codes/m-p/867532#M42682</guid>
      <dc:creator>shubham_d</dc:creator>
      <dc:date>2023-03-31T18:06:51Z</dc:date>
    </item>
    <item>
      <title>Re: Dedupe a dataset without sorting codes</title>
      <link>https://communities.sas.com/t5/SAS-Enterprise-Guide/Dedupe-a-dataset-without-sorting-codes/m-p/867533#M42683</link>
      <description>Hey Tom! Thanks for taking time out to resolve this. It worked!!&lt;BR /&gt;&lt;BR /&gt;Loved how you &amp;amp; others used the row number to maintain the original order.&lt;BR /&gt;&lt;BR /&gt;Very readable &amp;amp; easy to understand solution. Thanks!</description>
      <pubDate>Fri, 31 Mar 2023 18:08:09 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Enterprise-Guide/Dedupe-a-dataset-without-sorting-codes/m-p/867533#M42683</guid>
      <dc:creator>shubham_d</dc:creator>
      <dc:date>2023-03-31T18:08:09Z</dc:date>
    </item>
    <item>
      <title>Re: Dedupe a dataset without sorting codes</title>
      <link>https://communities.sas.com/t5/SAS-Enterprise-Guide/Dedupe-a-dataset-without-sorting-codes/m-p/867534#M42684</link>
      <description>Hey! Thanks for taking the time to resolve this. Your solution worked.&lt;BR /&gt;&lt;BR /&gt;I am a bit new to SAS &amp;amp; still learning how to work with arrays. Using a row/obs number to maintain the original order worked very well as an alternate solution. Will learn about arrays &amp;amp; try to understand your solution. Thanks &lt;span class="lia-unicode-emoji" title=":grinning_face_with_smiling_eyes:"&gt;😄&lt;/span&gt;</description>
      <pubDate>Fri, 31 Mar 2023 18:11:04 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Enterprise-Guide/Dedupe-a-dataset-without-sorting-codes/m-p/867534#M42684</guid>
      <dc:creator>shubham_d</dc:creator>
      <dc:date>2023-03-31T18:11:04Z</dc:date>
    </item>
    <item>
      <title>Re: Dedupe a dataset without sorting codes</title>
      <link>https://communities.sas.com/t5/SAS-Enterprise-Guide/Dedupe-a-dataset-without-sorting-codes/m-p/867536#M42685</link>
      <description>Hey Tom! Thanks for taking the time to resolve this. Yes, creating a column with an obs/row number worked here.</description>
      <pubDate>Fri, 31 Mar 2023 18:12:24 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Enterprise-Guide/Dedupe-a-dataset-without-sorting-codes/m-p/867536#M42685</guid>
      <dc:creator>shubham_d</dc:creator>
      <dc:date>2023-03-31T18:12:24Z</dc:date>
    </item>
  </channel>
</rss>

