<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic How do I identify Subtypes based on specific algorithm in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/How-do-I-identify-Subtypes-based-on-specific-algorithm/m-p/961176#M374724</link>
    <description>&lt;P&gt;Hello SAS experts,&lt;/P&gt;&lt;P&gt;I have a dataset and would like to identify an individual's true disease type based on the following:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;OL&gt;&lt;LI&gt;If an individual has Type A or B, select the record closest to the reference date based on service start or end date.&lt;/LI&gt;&lt;LI&gt;If an individual only has "Unspecified" and no valid A or B, retain the "Unspecified" record.&lt;/LI&gt;&lt;LI&gt;The reference date is the same for each individual.&lt;/LI&gt;&lt;LI&gt;Each individual should appear only once in the final output dataset (ie one line per individual ID)&lt;/LI&gt;&lt;/OL&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Below is the dataset I have and the expected output.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;data WORK.SUBTYPE_SAMPLE;&lt;BR /&gt;infile datalines dsd truncover;&lt;BR /&gt;input ID:BEST12. Type:$12. Reference_date:DATE9. service_start:DATE9. service_end:DATE9.;&lt;BR /&gt;format ID BEST12. Reference_date DATE9. service_start DATE9. service_end DATE9.;&lt;BR /&gt;datalines;&lt;BR /&gt;1 A 04JAN2016 10JAN2016 21JAN2016&lt;BR /&gt;1 B 04JAN2016 09JUL2018 09NOV2019&lt;BR /&gt;1 Unspecified 04JAN2016 06JAN2016 10FEB2016&lt;BR /&gt;2 B 08JUN2019 08DEC2019 19DEC2019&lt;BR /&gt;2 Unspecified 08JUN2019 22OCT2019 09AUG2019&lt;BR /&gt;3 Unspecified 02FEB2017 02APR2017 15APR2017&lt;BR /&gt;4 A 01JAN2020 03MAR2020 24MAR2020&lt;BR /&gt;4 A 01JAN2020 05MAY2018 10MAY2018&lt;BR /&gt;4 Unnspecified 01JAN2020 02JAN2020 03JAN2020&lt;BR /&gt;5 A 09SEP2016 11NOV2016 15NOV2016&lt;BR /&gt;5 B 09SEP2016 09SEP2016 10NOV2016&lt;BR /&gt;6 A 03MAR2016 30AUG2016 02NOV2016&lt;BR /&gt;6 A 03MAR2016 14OCT2016 19OCT2016&lt;BR /&gt;6 A 03MAR2016 26MAR2016 19DEC2016&lt;BR /&gt;6 Unspecified 03MAR2016 20OCT2016 21OCT2016&lt;BR /&gt;6 Unspecified 03MAR2016 12DEC2016 28DEC2016&lt;BR /&gt;6 B 03MAR2016 28JUN2016 15AUG2016&lt;BR /&gt;7 B 10OCT2022 11OCT2022 14NOV2022&lt;BR /&gt;8 Unspecified 01JAN2019 05MAY2019 06MAY2019&lt;BR /&gt;8 Unspecified 01JAN2019 07MAY2019 08MAY2019&lt;BR /&gt;;;;;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Want:&lt;/P&gt;&lt;TABLE border="0" cellspacing="0" cellpadding="0"&gt;&lt;TBODY&gt;&lt;TR&gt;&lt;TD&gt;Id&lt;/TD&gt;&lt;TD&gt;true_type&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;1&lt;/TD&gt;&lt;TD&gt;A&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;2&lt;/TD&gt;&lt;TD&gt;B&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;3&lt;/TD&gt;&lt;TD&gt;Unspecified&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;4&lt;/TD&gt;&lt;TD&gt;A&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;5&lt;/TD&gt;&lt;TD&gt;B&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;6&lt;/TD&gt;&lt;TD&gt;A&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;7&lt;/TD&gt;&lt;TD&gt;B&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;8&lt;/TD&gt;&lt;TD&gt;Unspecified&lt;/TD&gt;&lt;/TR&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thank you&lt;/P&gt;</description>
    <pubDate>Thu, 06 Mar 2025 23:46:04 GMT</pubDate>
    <dc:creator>cheroij</dc:creator>
    <dc:date>2025-03-06T23:46:04Z</dc:date>
    <item>
      <title>How do I identify Subtypes based on specific algorithm</title>
      <link>https://communities.sas.com/t5/SAS-Programming/How-do-I-identify-Subtypes-based-on-specific-algorithm/m-p/961176#M374724</link>
      <description>&lt;P&gt;Hello SAS experts,&lt;/P&gt;&lt;P&gt;I have a dataset and would like to identify an individual's true disease type based on the following:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;OL&gt;&lt;LI&gt;If an individual has Type A or B, select the record closest to the reference date based on service start or end date.&lt;/LI&gt;&lt;LI&gt;If an individual only has "Unspecified" and no valid A or B, retain the "Unspecified" record.&lt;/LI&gt;&lt;LI&gt;The reference date is the same for each individual.&lt;/LI&gt;&lt;LI&gt;Each individual should appear only once in the final output dataset (ie one line per individual ID)&lt;/LI&gt;&lt;/OL&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Below is the dataset I have and the expected output.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;data WORK.SUBTYPE_SAMPLE;&lt;BR /&gt;infile datalines dsd truncover;&lt;BR /&gt;input ID:BEST12. Type:$12. Reference_date:DATE9. service_start:DATE9. service_end:DATE9.;&lt;BR /&gt;format ID BEST12. Reference_date DATE9. service_start DATE9. service_end DATE9.;&lt;BR /&gt;datalines;&lt;BR /&gt;1 A 04JAN2016 10JAN2016 21JAN2016&lt;BR /&gt;1 B 04JAN2016 09JUL2018 09NOV2019&lt;BR /&gt;1 Unspecified 04JAN2016 06JAN2016 10FEB2016&lt;BR /&gt;2 B 08JUN2019 08DEC2019 19DEC2019&lt;BR /&gt;2 Unspecified 08JUN2019 22OCT2019 09AUG2019&lt;BR /&gt;3 Unspecified 02FEB2017 02APR2017 15APR2017&lt;BR /&gt;4 A 01JAN2020 03MAR2020 24MAR2020&lt;BR /&gt;4 A 01JAN2020 05MAY2018 10MAY2018&lt;BR /&gt;4 Unnspecified 01JAN2020 02JAN2020 03JAN2020&lt;BR /&gt;5 A 09SEP2016 11NOV2016 15NOV2016&lt;BR /&gt;5 B 09SEP2016 09SEP2016 10NOV2016&lt;BR /&gt;6 A 03MAR2016 30AUG2016 02NOV2016&lt;BR /&gt;6 A 03MAR2016 14OCT2016 19OCT2016&lt;BR /&gt;6 A 03MAR2016 26MAR2016 19DEC2016&lt;BR /&gt;6 Unspecified 03MAR2016 20OCT2016 21OCT2016&lt;BR /&gt;6 Unspecified 03MAR2016 12DEC2016 28DEC2016&lt;BR /&gt;6 B 03MAR2016 28JUN2016 15AUG2016&lt;BR /&gt;7 B 10OCT2022 11OCT2022 14NOV2022&lt;BR /&gt;8 Unspecified 01JAN2019 05MAY2019 06MAY2019&lt;BR /&gt;8 Unspecified 01JAN2019 07MAY2019 08MAY2019&lt;BR /&gt;;;;;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Want:&lt;/P&gt;&lt;TABLE border="0" cellspacing="0" cellpadding="0"&gt;&lt;TBODY&gt;&lt;TR&gt;&lt;TD&gt;Id&lt;/TD&gt;&lt;TD&gt;true_type&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;1&lt;/TD&gt;&lt;TD&gt;A&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;2&lt;/TD&gt;&lt;TD&gt;B&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;3&lt;/TD&gt;&lt;TD&gt;Unspecified&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;4&lt;/TD&gt;&lt;TD&gt;A&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;5&lt;/TD&gt;&lt;TD&gt;B&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;6&lt;/TD&gt;&lt;TD&gt;A&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;7&lt;/TD&gt;&lt;TD&gt;B&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;8&lt;/TD&gt;&lt;TD&gt;Unspecified&lt;/TD&gt;&lt;/TR&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thank you&lt;/P&gt;</description>
      <pubDate>Thu, 06 Mar 2025 23:46:04 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/How-do-I-identify-Subtypes-based-on-specific-algorithm/m-p/961176#M374724</guid>
      <dc:creator>cheroij</dc:creator>
      <dc:date>2025-03-06T23:46:04Z</dc:date>
    </item>
    <item>
      <title>Re: How do I identify Subtypes based on specific algorithm</title>
      <link>https://communities.sas.com/t5/SAS-Programming/How-do-I-identify-Subtypes-based-on-specific-algorithm/m-p/961178#M374725</link>
      <description>&lt;P&gt;The first part is just a slight re-work of your input dataset - it was producing all sorts of errors trying to read as it was:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data WORK.SUBTYPE_SAMPLE;
infile cards dsd truncover firstobs=1 dlm=',';
length ID $12 type $12 reference_date service_start service_end 4;
informat reference_date service_start service_end date9.;
format Reference_date DATE9. service_start DATE9. service_end DATE9.;
input ID Type Reference_date service_start service_end;
cards;
1,A,04JAN2016,10JAN2016,21JAN2016
1,B,04JAN2016,09JUL2018,09NOV2019
1,Unspecified,04JAN2016,06JAN2016,10FEB2016
2,B,08JUN2019,08DEC2019,19DEC2019
2,Unspecified,08JUN2019,22OCT2019,09AUG2019
3,Unspecified,02FEB2017,02APR2017,15APR2017
4,A,01JAN2020,03MAR2020,24MAR2020
4,A,01JAN2020,05MAY2018,10MAY2018
4,Unnspecified,01JAN2020,02JAN2020,03JAN2020
5,A,09SEP2016,11NOV2016,15NOV2016
5,B,09SEP2016,09SEP2016,10NOV2016
6,A,03MAR2016,30AUG2016,02NOV2016
6,A,03MAR2016,14OCT2016,19OCT2016
6,A,03MAR2016,26MAR2016,19DEC2016
6,Unspecified,03MAR2016,20OCT2016,21OCT2016
6,Unspecified,03MAR2016,12DEC2016,28DEC2016
6,B,03MAR2016,28JUN2016,15AUG2016
7,B,10OCT2022,11OCT2022,14NOV2022
8,Unspecified,01JAN2019,05MAY2019,06MAY2019
8,Unspecified,01JAN2019,07MAY2019,08MAY2019
;
run;

proc sort data=subtype_sample; by id; run;

data want;
set subtype_sample;
by ID;
length true_type $12 closest 4 anyAB 3;
retain true_type closest anyAB;
if first.ID then do;
	closest=10000;
	true_type='';
	anyAB=0;
end;
dist=min(
	abs(service_start-reference_date), 
	abs(service_end-reference_end)
	);
if type in ('A', 'B') then do;
	anyAB=1;
	if dist&amp;lt;closest then do;
		true_type=type;
		closest=dist;
	end;
end;
else if anyAB=0 then true_type='Unspecified';
if last.ID then output;
keep ID true_type closest;
run;

proc print data=want; run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="quickbluefish_0-1741309032284.png" style="width: 400px;"&gt;&lt;img src="https://communities.sas.com/t5/image/serverpage/image-id/105262i1E072FF129BC06F8/image-size/medium?v=v2&amp;amp;px=400" role="button" title="quickbluefish_0-1741309032284.png" alt="quickbluefish_0-1741309032284.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 07 Mar 2025 01:09:17 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/How-do-I-identify-Subtypes-based-on-specific-algorithm/m-p/961178#M374725</guid>
      <dc:creator>quickbluefish</dc:creator>
      <dc:date>2025-03-07T01:09:17Z</dc:date>
    </item>
    <item>
      <title>Re: How do I identify Subtypes based on specific algorithm</title>
      <link>https://communities.sas.com/t5/SAS-Programming/How-do-I-identify-Subtypes-based-on-specific-algorithm/m-p/961180#M374727</link>
      <description>&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data WORK.SUBTYPE_SAMPLE;
infile cards dsd truncover firstobs=1 dlm=',';
length ID $12 type $12 reference_date service_start service_end 4;
informat reference_date service_start service_end date9.;
format Reference_date DATE9. service_start DATE9. service_end DATE9.;
input ID Type Reference_date service_start service_end;
cards;
1,A,04JAN2016,10JAN2016,21JAN2016,
1,B,04JAN2016,09JUL2018,09NOV2019,
1,Unspecified,04JAN2016,06JAN2016,10FEB2016
2,B,08JUN2019,08DEC2019,19DEC2019,
2,Unspecified,08JUN2019,22OCT2019,09AUG2019
3,Unspecified,02FEB2017,02APR2017,15APR2017
4,A,01JAN2020,03MAR2020,24MAR2020,
4,A,01JAN2020,05MAY2018,10MAY2018,
4,Unnspecified,01JAN2020,02JAN2020,03JAN2020
5,A,09SEP2016,11NOV2016,15NOV2016,
5,B,09SEP2016,09SEP2016,10NOV2016,
6,A,03MAR2016,30AUG2016,02NOV2016,
6,A,03MAR2016,14OCT2016,19OCT2016,
6,A,03MAR2016,26MAR2016,19DEC2016,
6,Unspecified,03MAR2016,20OCT2016,21OCT2016
6,Unspecified,03MAR2016,12DEC2016,28DEC2016
6,B,03MAR2016,28JUN2016,15AUG2016,
7,B,10OCT2022,11OCT2022,14NOV2022,
8,Unspecified,01JAN2019,05MAY2019,06MAY2019
8,Unspecified,01JAN2019,07MAY2019,08MAY2019
;
run;
proc sql;
create table want as
select distinct ID,type
 from SUBTYPE_SAMPLE
  where type in ('A' 'B')
   group by ID
    having abs(reference_date-service_start)=min(abs(reference_date-service_start))
union
select distinct ID,type
from (select * from SUBTYPE_SAMPLE group by ID having sum(type='Unspecified')=count(*))
group by ID
having abs(reference_date-service_start)=min(abs(reference_date-service_start))

order by ID
;

quit;
&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Fri, 07 Mar 2025 02:12:33 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/How-do-I-identify-Subtypes-based-on-specific-algorithm/m-p/961180#M374727</guid>
      <dc:creator>Ksharp</dc:creator>
      <dc:date>2025-03-07T02:12:33Z</dc:date>
    </item>
    <item>
      <title>Re: How do I identify Subtypes based on specific algorithm</title>
      <link>https://communities.sas.com/t5/SAS-Programming/How-do-I-identify-Subtypes-based-on-specific-algorithm/m-p/961184#M374730</link>
      <description>&lt;P&gt;Given that the data are already sorted by ID, then you can&amp;nbsp;read each ID group with all the 'A' and 'B' types preceding all 'Unspecified' types, via&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data want;
  set have (where=(type in ('A','B')))
      have (where=(type ='Unspecified')) ;
  ...
;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Then:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data want;
  set have (where=(type in ('A','B')))
      have (where=(type ='Unspecified')) ;
  by id;
  retain min_dist .  true_type '           ' ;

  _dist=min(abs(service_start-reference_date)
           ,abs(service_end-reference_date));

  if first.id=1 then do;
    min_dist=_dist;
    true_type=type;
  end;
  else if type in ('A','B') and _dist&amp;lt;min_dist then do;
    true_type=type;
    min_dist=_dist;
  end;

  if last.id;
  keep id true_type ;
run;
&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Fri, 07 Mar 2025 04:52:22 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/How-do-I-identify-Subtypes-based-on-specific-algorithm/m-p/961184#M374730</guid>
      <dc:creator>mkeintz</dc:creator>
      <dc:date>2025-03-07T04:52:22Z</dc:date>
    </item>
    <item>
      <title>Re: How do I identify Subtypes based on specific algorithm</title>
      <link>https://communities.sas.com/t5/SAS-Programming/How-do-I-identify-Subtypes-based-on-specific-algorithm/m-p/961608#M374874</link>
      <description>&lt;P&gt;Thank you!!&lt;/P&gt;</description>
      <pubDate>Tue, 11 Mar 2025 22:07:11 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/How-do-I-identify-Subtypes-based-on-specific-algorithm/m-p/961608#M374874</guid>
      <dc:creator>cheroij</dc:creator>
      <dc:date>2025-03-11T22:07:11Z</dc:date>
    </item>
  </channel>
</rss>

