<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Assign unique identifier by multiple variables in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Assign-unique-identifier-by-multiple-variables/m-p/633458#M187913</link>
    <description>I ended up using Option #2.</description>
    <pubDate>Thu, 19 Mar 2020 23:13:16 GMT</pubDate>
    <dc:creator>Cruise</dc:creator>
    <dc:date>2020-03-19T23:13:16Z</dc:date>
    <item>
      <title>Assign unique identifier by multiple variables</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Assign-unique-identifier-by-multiple-variables/m-p/633047#M187771</link>
      <description>&lt;P&gt;Hi Folks:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp; &amp;nbsp; I have an unidentified data (no data on how many times each patient was hospitalized). Therefore, I'd like to understand the extent of patients/data rows that share the same birth_y, birth_m, sex, zip, discharge_y and discharge_m. Given my research question is concerned with a rapidly fatal (max survival time ~ 6month) rare disease (~8 per 100,000 people), one could relatively safely assume that it's unlikely that the two individuals diagnosed with this rare medical condition is to occur to have the same birth_y, birth_m, sex, zip, discharge_y and discharge_m. This gives me a hope that I could create a synthetic unique individual identifier based on these variables. I'm aware of proc sort nodupkey by listing these variables to de-duplicate the data. But I have to assess the reliability of this assumption before I get to the point of de-duplication.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;BR /&gt;&amp;nbsp; &amp;nbsp; Do you know how to create unique identifier based on the multiple variables?&lt;/P&gt;
&lt;P&gt;A patient hospitalized twice a year could take different discharge_y and discharge_m. But this could be solved later based on this initial screening.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Thanks for your time in advance.&lt;/P&gt;
&lt;P&gt;See mock data below, if that helps.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data have; 
input birth_y birth_m sex zip discharge_year discharge_month; 
cards;
1980 2 1 12202 1991 3
1982 2 1 12202 1991 3
1970 6 2 12307 1971 8
1965 7 2 12907 1968 9
;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 18 Mar 2020 19:23:15 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Assign-unique-identifier-by-multiple-variables/m-p/633047#M187771</guid>
      <dc:creator>Cruise</dc:creator>
      <dc:date>2020-03-18T19:23:15Z</dc:date>
    </item>
    <item>
      <title>Re: Assign unique identifier by multiple variables</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Assign-unique-identifier-by-multiple-variables/m-p/633057#M187775</link>
      <description>&lt;P&gt;Would something like this work?&lt;/P&gt;</description>
      <pubDate>Wed, 18 Mar 2020 19:26:11 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Assign-unique-identifier-by-multiple-variables/m-p/633057#M187775</guid>
      <dc:creator>JerryV</dc:creator>
      <dc:date>2020-03-18T19:26:11Z</dc:date>
    </item>
    <item>
      <title>Re: Synthetic unique identifier in an un-identified longitudinal data</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Assign-unique-identifier-by-multiple-variables/m-p/633069#M187781</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/132289"&gt;@Cruise&lt;/a&gt;,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;As a start, I would count combinations like this:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;proc freq data=have noprint;
tables birth_y*birth_m*sex*zip / out=cnt;
run;

proc freq data=cnt;
tables count;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;This will give you the numbers of unique, duplicate, triplicate, ... combinations of&amp;nbsp;&lt;SPAN&gt;birth_y, birth_m, sex and zip. Ideally, it turns out that 100% of these combinations are unique. Otherwise, add the discharge year and month to the first TABLES specification (...&lt;FONT face="courier new,courier"&gt;*discharge_year*discharge_month&lt;/FONT&gt;) and rerun the two PROC FREQ steps. Then you can decide whether discharge year and month need to be included in concatenations (as have been suggested) or other methods to possibly obtain a unique identifier.&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 18 Mar 2020 19:58:40 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Assign-unique-identifier-by-multiple-variables/m-p/633069#M187781</guid>
      <dc:creator>FreelanceReinh</dc:creator>
      <dc:date>2020-03-18T19:58:40Z</dc:date>
    </item>
    <item>
      <title>Re: Assign unique identifier by multiple variables</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Assign-unique-identifier-by-multiple-variables/m-p/633070#M187782</link>
      <description>&lt;P&gt;You can create a unique ID either by concatenation of all selected variables, in a fixed character format and length or by assigning a sequential number:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;/* option 1 */
data want;
 set have;
       ID = cats(birth_y, birth_m, sex, ..... );
run;

/*option 2 */
data want;
 set have;
     by birth_y birth_m sex .....;   /* if need sort data have */
          retain ID 0;
          if first.&amp;lt;last variable in BY list&amp;gt; then ID+1;
run;&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Wed, 18 Mar 2020 20:00:43 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Assign-unique-identifier-by-multiple-variables/m-p/633070#M187782</guid>
      <dc:creator>Shmuel</dc:creator>
      <dc:date>2020-03-18T20:00:43Z</dc:date>
    </item>
    <item>
      <title>Re: Synthetic unique identifier in an un-identified longitudinal data</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Assign-unique-identifier-by-multiple-variables/m-p/633126#M187797</link>
      <description>&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/32733"&gt;@FreelanceReinh&lt;/a&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Thanks a lot Reinhard!&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I tried multiple comb of variables. However, it appears that de-duplicating my hospital discharge data by the two separate comb of 6 and 5 variables brings the N of unique records closer to the Reference N (incidence data from registry a.k.a gold-standard). PFI=permanent facility ID. dis_y=discharge year. I'm inclined to use 5-variable combo because patients could use multiple facilities. Pls, let me know if any comments.&amp;nbsp;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="proc freq tables.png" style="width: 999px;"&gt;&lt;img src="https://communities.sas.com/t5/image/serverpage/image-id/37014i2B02D506F0499BC8/image-size/large?v=v2&amp;amp;px=999" role="button" title="proc freq tables.png" alt="proc freq tables.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;/*6 variables: birth_y birth_m sex zip dis_y pfi*/

proc freq data=have noprint;
tables birth_y*birth_m*sex*zip*dis_y*pfi/ out=cnt6var;
where DX1_ID=1; 
run;
proc freq data=cnt6var;
tables count;
run;
proc sort data=have nodupkey out=dedup6var; /*34,320 vs 31,541*/
by birth_y birth_m sex zip dis_y pfi;
where DX1_ID=1; 
run; 

/*5 variables: birth_y birth_m sex zip dis_y*/ 

proc freq data=have noprint;
tables birth_y*birth_m*sex*zip*dis_y/ out=cnt5var;
run;
proc freq data=cnt5var;
tables count;
run;
proc sort data=have nodupkey out=dedup5var; /*31,448 vs 31,541*/ 
by birth_y birth_m sex zip dis_y;
run;&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Thu, 19 Mar 2020 00:51:00 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Assign-unique-identifier-by-multiple-variables/m-p/633126#M187797</guid>
      <dc:creator>Cruise</dc:creator>
      <dc:date>2020-03-19T00:51:00Z</dc:date>
    </item>
    <item>
      <title>Re: Assign unique identifier by multiple variables</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Assign-unique-identifier-by-multiple-variables/m-p/633127#M187798</link>
      <description>It worked no problem. I'll frequent this approach you shared. Thanks a lot.</description>
      <pubDate>Thu, 19 Mar 2020 00:53:32 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Assign-unique-identifier-by-multiple-variables/m-p/633127#M187798</guid>
      <dc:creator>Cruise</dc:creator>
      <dc:date>2020-03-19T00:53:32Z</dc:date>
    </item>
    <item>
      <title>Re: Assign unique identifier by multiple variables</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Assign-unique-identifier-by-multiple-variables/m-p/633128#M187799</link>
      <description>Thanks a lot for this intuitive approach ! It helps.</description>
      <pubDate>Thu, 19 Mar 2020 00:54:43 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Assign-unique-identifier-by-multiple-variables/m-p/633128#M187799</guid>
      <dc:creator>Cruise</dc:creator>
      <dc:date>2020-03-19T00:54:43Z</dc:date>
    </item>
    <item>
      <title>Re: Synthetic unique identifier in an un-identified longitudinal data</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Assign-unique-identifier-by-multiple-variables/m-p/633201#M187824</link>
      <description>&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/132289"&gt;@Cruise&lt;/a&gt;&amp;nbsp;wrote:&lt;BR /&gt;
&lt;P&gt;I tried multiple comb of variables. However, it appears that de-duplicating my hospital discharge data by the two separate comb of 6 and 5 variables brings the N of unique records closer to the Reference N (incidence data from registry a.k.a gold-standard). PFI=permanent facility ID. dis_y=discharge year. I'm inclined to use 5-variable combo because patients could use multiple facilities. Pls, let me know if any comments.&amp;nbsp;&lt;/P&gt;
&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;Sounds reasonable to me (without background knowledge). Also given that the 31,448 combinations of &lt;FONT face="courier new,courier"&gt;birth_y&lt;/FONT&gt;, &lt;FONT face="courier new,courier"&gt;birth_m&lt;/FONT&gt;, &lt;FONT face="courier new,courier"&gt;sex&lt;/FONT&gt;, &lt;FONT face="courier new,courier"&gt;zip&lt;/FONT&gt; and &lt;FONT face="courier new,courier"&gt;dis_y&lt;/FONT&gt; are within 0.5% of the "Reference N," I would assume that these five variables form a suitable identifier.&lt;/P&gt;</description>
      <pubDate>Thu, 19 Mar 2020 10:10:58 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Assign-unique-identifier-by-multiple-variables/m-p/633201#M187824</guid>
      <dc:creator>FreelanceReinh</dc:creator>
      <dc:date>2020-03-19T10:10:58Z</dc:date>
    </item>
    <item>
      <title>Re: Synthetic unique identifier in an un-identified longitudinal data</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Assign-unique-identifier-by-multiple-variables/m-p/633273#M187860</link>
      <description>Thank you. I have to work out the cases with the same 4 variables (birth_y, birth_m, sex and ZIP) but different discharge years.</description>
      <pubDate>Thu, 19 Mar 2020 13:27:34 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Assign-unique-identifier-by-multiple-variables/m-p/633273#M187860</guid>
      <dc:creator>Cruise</dc:creator>
      <dc:date>2020-03-19T13:27:34Z</dc:date>
    </item>
    <item>
      <title>Re: Assign unique identifier by multiple variables</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Assign-unique-identifier-by-multiple-variables/m-p/633458#M187913</link>
      <description>I ended up using Option #2.</description>
      <pubDate>Thu, 19 Mar 2020 23:13:16 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Assign-unique-identifier-by-multiple-variables/m-p/633458#M187913</guid>
      <dc:creator>Cruise</dc:creator>
      <dc:date>2020-03-19T23:13:16Z</dc:date>
    </item>
  </channel>
</rss>

