<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Assign values to a duplicate ID in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Assign-values-to-a-duplicate-ID/m-p/693677#M211560</link>
    <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/78398"&gt;@stancemcgraw&lt;/a&gt;,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;As you can see from&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/13884"&gt;@ballardw&lt;/a&gt;'s suggestion,&amp;nbsp;this is no problem technically. But before taking a single step towards changing the IDs, I would rather take ten steps to investigate &lt;EM&gt;why&lt;/EM&gt; there are duplicates (check source data, read documentation, ask co-workers, etc.).&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;IDs, in particular patient IDs, are crucial and must not be changed in an ad-hoc manner. They are likely to occur in several datasets and are typically used as key variables to join tables. (That is, a change in one dataset would require &lt;EM&gt;consistent&lt;/EM&gt; changes in other datasets.) There is also a risk of incorrectly splitting the observations from a single patient by assigning different IDs. Age can change over time, errors in the data are possible and the same combination of Age and Sex may or may not belong to different patients.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;One possible reason for duplicate IDs is that only the combination of two (or more) key variables is unique. For example, in multi-center clinical trials it is common to use the combination of center ID and patient number as a unique key on patient level. Duplicates in character variables can also result from truncation in an earlier step: What if you notice that all duplicates start with "ID-10," whereas the "ID-9..." cases are unique in a &lt;FONT face="courier new,courier"&gt;Studyid&lt;/FONT&gt; variable with length 7?&lt;/P&gt;
&lt;PRE&gt;&lt;STRONG&gt;ID-9998&lt;/STRONG&gt;
&lt;STRONG&gt;ID-9999&lt;/STRONG&gt;
...
&lt;STRONG&gt;ID-1036&lt;/STRONG&gt;&lt;FONT color="#999999"&gt;1&lt;/FONT&gt;
&lt;STRONG&gt;ID-1037&lt;/STRONG&gt;&lt;FONT color="#999999"&gt;0&lt;/FONT&gt;
&lt;STRONG&gt;ID-1036&lt;/STRONG&gt;&lt;FONT color="#999999"&gt;4&lt;/FONT&gt;
&lt;STRONG&gt;ID-1037&lt;/STRONG&gt;&lt;FONT color="#999999"&gt;2&lt;/FONT&gt;
&lt;STRONG&gt;ID-1036&lt;/STRONG&gt;&lt;FONT color="#999999"&gt;9&lt;/FONT&gt;
&lt;STRONG&gt;ID-1037&lt;/STRONG&gt;&lt;FONT color="#999999"&gt;5&lt;/FONT&gt;&lt;/PRE&gt;</description>
    <pubDate>Fri, 23 Oct 2020 08:51:47 GMT</pubDate>
    <dc:creator>FreelanceReinh</dc:creator>
    <dc:date>2020-10-23T08:51:47Z</dc:date>
    <item>
      <title>Assign values to a duplicate ID</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Assign-values-to-a-duplicate-ID/m-p/693614#M211528</link>
      <description>&lt;P&gt;hi all,&lt;/P&gt;&lt;P&gt;&amp;nbsp; &amp;nbsp;it has come to my attention that there is a dataset I have that has duplicate IDs, but each duplicate is actually a different patient. They are character IDs and I'd like to assign letters (a's, b's, &amp;amp; c's) to them to differentiate them so that I can eventually match merge different files into this main dataset. Is there a code that can assign some sort of value to duplicate IDs?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Data have:&lt;/P&gt;&lt;P&gt;Studyid &amp;nbsp; &amp;nbsp; &amp;nbsp;Age &amp;nbsp; &amp;nbsp; &amp;nbsp;Sex &amp;nbsp; &amp;nbsp; &amp;nbsp;HLOS&lt;/P&gt;&lt;P&gt;ID-1036 &amp;nbsp; &amp;nbsp; &amp;nbsp; 56 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;F &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 12&lt;/P&gt;&lt;P&gt;ID-1037 &amp;nbsp; &amp;nbsp; &amp;nbsp; 60 &amp;nbsp; &amp;nbsp; &amp;nbsp; M &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 4&lt;/P&gt;&lt;P&gt;ID-1036 &amp;nbsp; &amp;nbsp; &amp;nbsp; 80 &amp;nbsp; &amp;nbsp; &amp;nbsp; M &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;10&lt;/P&gt;&lt;P&gt;ID-1037 &amp;nbsp; &amp;nbsp; &amp;nbsp; 90 &amp;nbsp; &amp;nbsp; &amp;nbsp;F &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 3&lt;/P&gt;&lt;P&gt;ID-1036 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;30 &amp;nbsp; &amp;nbsp; F &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 2&lt;/P&gt;&lt;P&gt;ID-1037 &amp;nbsp; &amp;nbsp; &amp;nbsp; 40 &amp;nbsp; &amp;nbsp; &amp;nbsp;M &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 5&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Data want:&lt;/P&gt;&lt;P&gt;Studyid &amp;nbsp; &amp;nbsp; &amp;nbsp;Age &amp;nbsp; &amp;nbsp; &amp;nbsp;Sex &amp;nbsp; &amp;nbsp; &amp;nbsp;HLOS&lt;/P&gt;&lt;P&gt;ID-1036a &amp;nbsp; &amp;nbsp; &amp;nbsp; 56 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;F &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 12&lt;/P&gt;&lt;P&gt;ID-1037a &amp;nbsp; &amp;nbsp; &amp;nbsp; 60 &amp;nbsp; &amp;nbsp; &amp;nbsp; M &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 4&lt;/P&gt;&lt;P&gt;ID-1036b &amp;nbsp; &amp;nbsp; &amp;nbsp; 80 &amp;nbsp; &amp;nbsp; &amp;nbsp; M &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;10&lt;/P&gt;&lt;P&gt;ID-1037b &amp;nbsp; &amp;nbsp; &amp;nbsp; 90 &amp;nbsp; &amp;nbsp; &amp;nbsp;F &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 3&lt;/P&gt;&lt;P&gt;ID-1036c &amp;nbsp; &amp;nbsp; &amp;nbsp; 30 &amp;nbsp; &amp;nbsp; F &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 2&lt;/P&gt;&lt;P&gt;ID-1037b &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;40 &amp;nbsp; &amp;nbsp; &amp;nbsp;M &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 5&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 22 Oct 2020 21:21:45 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Assign-values-to-a-duplicate-ID/m-p/693614#M211528</guid>
      <dc:creator>stancemcgraw</dc:creator>
      <dc:date>2020-10-22T21:21:45Z</dc:date>
    </item>
    <item>
      <title>Re: Assign values to a duplicate ID</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Assign-values-to-a-duplicate-ID/m-p/693625#M211535</link>
      <description>&lt;P&gt;What is the assigned length of the variable in question? If the length is 7 then you can't append any suffix to a value like ID-1036 without creating a new variable as it just would not keep the suffix.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;The typical approach would be to sort by the variable and then increment a counter.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;Data have;
input Studyid $     Age      Sex   $   HLOS;
datalines;
ID-1036       56        F           12
ID-1037       60       M             4
ID-1036       80       M            10
ID-1037       90      F             3
ID-1036        30     F             2
ID-1037       40      M             5
;

proc sort data=have;
   by studyid;
run;

data want;
  set have (rename=(studyid=oldid));
  length studyid $ 10; /*&amp;lt; this should probably be at least 1 if not 2 characters longer*/
  retain counter 0;
  by oldid;
  if first.oldid then counter=0;
  else counter+1;
  studyid = cats(oldid,byte(counter+rank('a')));
  drop counter oldid;
run;
&lt;/PRE&gt;
&lt;P&gt;The BYTE function returns a single character from the character set, the RANK function returns the position in a character set of a specific character. So adding 0 to the position of 'a' allows returning the character 'a', adding 1 yields 'b', and so forth. IF you have more than 26 duplicates you are going to get some odd suffixes. The Cats function does the appending.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;If you want to keep the order then you need to add an order variable before the sort and then sort by that order variable after the want data step. I'll leave that as an exercise for the interested reader. [It's on this forum in dozens of places]&lt;/P&gt;</description>
      <pubDate>Thu, 22 Oct 2020 22:48:34 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Assign-values-to-a-duplicate-ID/m-p/693625#M211535</guid>
      <dc:creator>ballardw</dc:creator>
      <dc:date>2020-10-22T22:48:34Z</dc:date>
    </item>
    <item>
      <title>Re: Assign values to a duplicate ID</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Assign-values-to-a-duplicate-ID/m-p/693677#M211560</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/78398"&gt;@stancemcgraw&lt;/a&gt;,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;As you can see from&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/13884"&gt;@ballardw&lt;/a&gt;'s suggestion,&amp;nbsp;this is no problem technically. But before taking a single step towards changing the IDs, I would rather take ten steps to investigate &lt;EM&gt;why&lt;/EM&gt; there are duplicates (check source data, read documentation, ask co-workers, etc.).&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;IDs, in particular patient IDs, are crucial and must not be changed in an ad-hoc manner. They are likely to occur in several datasets and are typically used as key variables to join tables. (That is, a change in one dataset would require &lt;EM&gt;consistent&lt;/EM&gt; changes in other datasets.) There is also a risk of incorrectly splitting the observations from a single patient by assigning different IDs. Age can change over time, errors in the data are possible and the same combination of Age and Sex may or may not belong to different patients.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;One possible reason for duplicate IDs is that only the combination of two (or more) key variables is unique. For example, in multi-center clinical trials it is common to use the combination of center ID and patient number as a unique key on patient level. Duplicates in character variables can also result from truncation in an earlier step: What if you notice that all duplicates start with "ID-10," whereas the "ID-9..." cases are unique in a &lt;FONT face="courier new,courier"&gt;Studyid&lt;/FONT&gt; variable with length 7?&lt;/P&gt;
&lt;PRE&gt;&lt;STRONG&gt;ID-9998&lt;/STRONG&gt;
&lt;STRONG&gt;ID-9999&lt;/STRONG&gt;
...
&lt;STRONG&gt;ID-1036&lt;/STRONG&gt;&lt;FONT color="#999999"&gt;1&lt;/FONT&gt;
&lt;STRONG&gt;ID-1037&lt;/STRONG&gt;&lt;FONT color="#999999"&gt;0&lt;/FONT&gt;
&lt;STRONG&gt;ID-1036&lt;/STRONG&gt;&lt;FONT color="#999999"&gt;4&lt;/FONT&gt;
&lt;STRONG&gt;ID-1037&lt;/STRONG&gt;&lt;FONT color="#999999"&gt;2&lt;/FONT&gt;
&lt;STRONG&gt;ID-1036&lt;/STRONG&gt;&lt;FONT color="#999999"&gt;9&lt;/FONT&gt;
&lt;STRONG&gt;ID-1037&lt;/STRONG&gt;&lt;FONT color="#999999"&gt;5&lt;/FONT&gt;&lt;/PRE&gt;</description>
      <pubDate>Fri, 23 Oct 2020 08:51:47 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Assign-values-to-a-duplicate-ID/m-p/693677#M211560</guid>
      <dc:creator>FreelanceReinh</dc:creator>
      <dc:date>2020-10-23T08:51:47Z</dc:date>
    </item>
  </channel>
</rss>

