<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Claims level data-working with duplicates to classify disease in New SAS User</title>
    <link>https://communities.sas.com/t5/New-SAS-User/Claims-level-data-working-with-duplicates-to-classify-disease/m-p/918694#M41125</link>
    <description>&lt;P&gt;Thank you. The latter code was helpful.&lt;/P&gt;</description>
    <pubDate>Sat, 02 Mar 2024 19:02:24 GMT</pubDate>
    <dc:creator>Dissertator</dc:creator>
    <dc:date>2024-03-02T19:02:24Z</dc:date>
    <item>
      <title>Claims level data-working with duplicates to classify disease</title>
      <link>https://communities.sas.com/t5/New-SAS-User/Claims-level-data-working-with-duplicates-to-classify-disease/m-p/918629#M41121</link>
      <description>&lt;P&gt;Hello,&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I am stuck with categorizing patients based on their first diagnosis. Here is some background information about my project.&amp;nbsp;&lt;/P&gt;&lt;P&gt;I am interested in long-term risk of cardiovascular diseases (CVDs) among women.&amp;nbsp;&lt;/P&gt;&lt;P&gt;I will utilize 4 data sources to identify women with CVD diagnosis: Emergency department, Hospital discharge, Death certificate, and Medical claims. I pulled ICD codes and service dates from each data source and flagged them. For example, flag_isch_dc indicates an ischemic heart disease diagnosis from the death certificate while&amp;nbsp;flag_isch_hds indicates an ischemic heart disease diagnosis from the hospital discharge data.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The data is at the claims level, and I eventually want to bring it to the patient level. Before doing so, I will need to create a single CVD category. Given that I am using 4 different data sources and 4 different CVD diagnosis flags and dates, I am stuck with correctly ordering them because some women have diagnoses captured in one data but not in another.&amp;nbsp;&lt;/P&gt;&lt;P&gt;The patient level id is Row_id2. However, if I remove duplicates, I am losing a lot of women with their diagnosis.&lt;/P&gt;&lt;P&gt;So, I want to correctly categorize them without losing any participants.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I initially wanted to categorize women whichever the earliest diagnosis was. And I want to keep that date for the first diagnosis as I will use Cox-proportional hazard regression.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;To simplify further, I have 4 different flags with 4 different dates for ischemic heart disease. Example:&lt;/P&gt;&lt;DIV class=""&gt;&amp;nbsp;&lt;/DIV&gt;&lt;TABLE border="1"&gt;&lt;TBODY&gt;&lt;TR&gt;&lt;TD&gt;&amp;nbsp;&lt;/TD&gt;&lt;TD&gt;Date of Diagnosis&lt;/TD&gt;&lt;TD&gt;Ischemic&amp;nbsp;&lt;/TD&gt;&lt;TD&gt;Cerebrovascular&lt;/TD&gt;&lt;TD&gt;Hypertension&lt;/TD&gt;&lt;TD&gt;Other heart&lt;/TD&gt;&lt;TD&gt;Other CVD&lt;/TD&gt;&lt;TD&gt;Final CVD grouping&lt;/TD&gt;&lt;TD&gt;Date of diagnosis from final CVD grouping&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;Medical Claims&lt;/TD&gt;&lt;TD&gt;MC059_MC&lt;/TD&gt;&lt;TD&gt;flag_isch_mc&lt;/TD&gt;&lt;TD&gt;flag_cero_mc&lt;/TD&gt;&lt;TD&gt;flag_hypt_mc&lt;/TD&gt;&lt;TD&gt;flag_ohrt_mc&lt;/TD&gt;&lt;TD&gt;flag_ocvd_mc&lt;/TD&gt;&lt;TD&gt;&amp;nbsp;&lt;/TD&gt;&lt;TD&gt;&amp;nbsp;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;Emergency Department&lt;/TD&gt;&lt;TD&gt;Service_from_Dt_ER&lt;/TD&gt;&lt;TD&gt;flag_isch_er&lt;/TD&gt;&lt;TD&gt;flag_cero_er&lt;/TD&gt;&lt;TD&gt;flag_hypt_er&lt;/TD&gt;&lt;TD&gt;flag_ohrt_er&lt;/TD&gt;&lt;TD&gt;flag_ocvd_er&lt;/TD&gt;&lt;TD&gt;&amp;nbsp;&lt;/TD&gt;&lt;TD&gt;&amp;nbsp;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;HDS&lt;/TD&gt;&lt;TD&gt;Service_from_Dt_HDS&lt;/TD&gt;&lt;TD&gt;flag_isch_hds&lt;/TD&gt;&lt;TD&gt;flag_cero_hds&lt;/TD&gt;&lt;TD&gt;flag_hypt_hds&lt;/TD&gt;&lt;TD&gt;flag_ohrt_hds&lt;/TD&gt;&lt;TD&gt;flag_ocvd_hds&lt;/TD&gt;&lt;TD&gt;&amp;nbsp;&lt;/TD&gt;&lt;TD&gt;&amp;nbsp;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;Death Certificate&lt;/TD&gt;&lt;TD&gt;death_date_DC&lt;/TD&gt;&lt;TD&gt;flag_isch__dc&lt;/TD&gt;&lt;TD&gt;flag_cero__dc&lt;/TD&gt;&lt;TD&gt;flag_hypt_dc&lt;/TD&gt;&lt;TD&gt;flag_ohrt_dc&lt;/TD&gt;&lt;TD&gt;flag_ocvd_dc&lt;/TD&gt;&lt;TD&gt;&amp;nbsp;&lt;/TD&gt;&lt;TD&gt;&amp;nbsp;&lt;/TD&gt;&lt;/TR&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I want to group them (Final CVD grouping) in a way that woman will be classified based on her earliest diagnosis in these 4 files. Then, I should be able to bring it to patient-level data.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I attached a SAS file with 60 observations with duplicates.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I greatly appreciate it if you have any suggestions on how to achieve the correct classification and then bring it back to patient-level data.&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thank you for your time and help.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 01 Mar 2024 17:29:43 GMT</pubDate>
      <guid>https://communities.sas.com/t5/New-SAS-User/Claims-level-data-working-with-duplicates-to-classify-disease/m-p/918629#M41121</guid>
      <dc:creator>Dissertator</dc:creator>
      <dc:date>2024-03-01T17:29:43Z</dc:date>
    </item>
    <item>
      <title>Re: Claims level data-working with duplicates to classify disease</title>
      <link>https://communities.sas.com/t5/New-SAS-User/Claims-level-data-working-with-duplicates-to-classify-disease/m-p/918635#M41122</link>
      <description>&lt;P&gt;First thing I would do is make sure the DATE variables are actual SAS dates. While you can sort character values in the form of yyyy-mm-dd and get an order it is very hard to actually compare such values.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Your attached example data has no values of any of the flag other than 0 or missing as in all values of each flag have exactly one value either 0 or missing. So it is going to be very hard to do anything with that example.&lt;/P&gt;
&lt;P&gt;You also do not describe in any way how you expect to assign a single CVD category.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Are you wanting the first date of diagnosis by data source or by diagnosis?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;This is how I would reshape the data get all the flags by data source:&lt;/P&gt;
&lt;PRE&gt;data reshape;
   set dl.sas_try;
   array d (*)mc059_mc death_date_dc service_from_date_er service_from_date_hds;
   array fi (*) flag_isch_mc flag_isch_dc flag_isch_er flag_isch_hds; 
   array fc (*) flag_cero_mc flag_cero_dc flag_cero_er flag_cero_hds; 
   array fh (*) flag_hypt_mc flag_hypt_dc flag_hypt_er flag_hypt_hds; 
   array fo (*) flag_ohrt_mc flag_ohrt_dc flag_ohrt_er flag_ohrt_hds; 
   array fv (*) flag_ocvd_mc flag_ocvd_dc flag_ocvd_er flag_ocvd_hds; 
   array s  (4) $3 _temporary_ ('MC','DC','ER','HDS');
   do i=1 to dim(d);
     source = s[i];
     date = input(d[i],yymmdd10.);
     flag_isch =fi[i];
     flag_cero =fc[i];
     flag_hypt =fh[i];
     flag_ohrt =fo[i];
     flag_ocvd =fv[i];
     if not missing (date) then output;
   end;
   format date yymmdd10.;
   keep Row_id2 source date flag_isch flag_cero flag_hypt
        flag_ohrt flag_ocvd;
run;
&lt;/PRE&gt;
&lt;P&gt;and to get the first by the source:&lt;/P&gt;
&lt;PRE&gt;Proc sort data=reshape;
   by Row_id2 source date;
run;

data maybe;
   set reshape;
   by Row_id2 source date;
   if first.source;
run;


&lt;/PRE&gt;
&lt;P&gt;I suspect that you may actually need a different reshaping and only including in the output something where one or more of those flags indicate a diagnosis but your example data, since none of the flag values vary, doesn't have anything that seems likely.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 01 Mar 2024 18:27:55 GMT</pubDate>
      <guid>https://communities.sas.com/t5/New-SAS-User/Claims-level-data-working-with-duplicates-to-classify-disease/m-p/918635#M41122</guid>
      <dc:creator>ballardw</dc:creator>
      <dc:date>2024-03-01T18:27:55Z</dc:date>
    </item>
    <item>
      <title>Re: Claims level data-working with duplicates to classify disease</title>
      <link>https://communities.sas.com/t5/New-SAS-User/Claims-level-data-working-with-duplicates-to-classify-disease/m-p/918640#M41123</link>
      <description>&lt;P&gt;Thank you for your prompt response&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/13884"&gt;@ballardw&lt;/a&gt;&amp;nbsp;.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Unfortunately, CVD (both overall and subgroups) is rare in my study population, which is why I have so many missing and 0's.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I want to categorize them by their earliest diagnosis. Let's say a woman appears to have a flag for ischemic heart disease (flag_isch_ER) &amp;nbsp;in emergency department data (ER) and also has flag for cerebrovascular disease (flag_cero_hds) in hospital discharge data (HDS). I want this woman to be categorized based on "the first date of diagnosis by diagnosis". So, if the first date of diagnosis for "flag_isch_ER" is earlier than "flag_cero_hds" then I want to have her to be categorized under "ischemic heart disease group."&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I am trying the code right now and will update you shortly.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 01 Mar 2024 19:17:52 GMT</pubDate>
      <guid>https://communities.sas.com/t5/New-SAS-User/Claims-level-data-working-with-duplicates-to-classify-disease/m-p/918640#M41123</guid>
      <dc:creator>Dissertator</dc:creator>
      <dc:date>2024-03-01T19:17:52Z</dc:date>
    </item>
    <item>
      <title>Re: Claims level data-working with duplicates to classify disease</title>
      <link>https://communities.sas.com/t5/New-SAS-User/Claims-level-data-working-with-duplicates-to-classify-disease/m-p/918649#M41124</link>
      <description>&lt;P&gt;That means the reshaping should only include output when your diagnosis occurs. Again as I said, your example data doesn't show any so it hard to deal with.&lt;/P&gt;
&lt;P&gt;That being the case I would probably reshape to include a separate field for diagnosis and only keep those. The dates and type of diagnosis would be relatively easy to deal with. But the rules would need to be expanded upon.&lt;/P&gt;
&lt;P&gt;Something like this would have ONLY the diagnosis, the TYPE variable holding the information as to which.&lt;/P&gt;
&lt;P&gt;Then a sort by Id date type would have the positive diagnosis in order.&lt;/P&gt;
&lt;PRE&gt;data reshape;
   set dl.sas_try;
   array d (*)mc059_mc death_date_dc service_from_date_er service_from_date_hds;
   array fi (*) flag_isch_mc flag_isch_dc flag_isch_er flag_isch_hds; 
   array fc (*) flag_cero_mc flag_cero_dc flag_cero_er flag_cero_hds; 
   array fh (*) flag_hypt_mc flag_hypt_dc flag_hypt_er flag_hypt_hds; 
   array fo (*) flag_ohrt_mc flag_ohrt_dc flag_ohrt_er flag_ohrt_hds; 
   array fv (*) flag_ocvd_mc flag_ocvd_dc flag_ocvd_er flag_ocvd_hds; 
   array s  (4) $3 _temporary_ ('MC','DC','ER','HDS');
   do i=1 to dim(d);
     source = s[i];
     date = input(d[i],yymmdd10.);
     flag_isch =fi[i];
     flag_cero =fc[i];
     flag_hypt =fh[i];
     flag_ohrt =fo[i];
     flag_ocvd =fv[i];
     if not missing (date) then do;
        /* dummy logic as not clear from your data*/
        /* if f[i] indicates diagnosis then do;
              type='ISCH';
              output;
            end;
        /* 
        /* repead for each flag*/

     end;
   end;
   format date yymmdd10.;
   keep Row_id2 source date type;
        flag_ohrt flag_ocvd;
run;&lt;/PRE&gt;
&lt;P&gt;Depending on your complete set of rules the reshaped data may need more manipulation.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Part of the moral of this story is example data should actually be representative and have the information to actually apply any algorithm for creating other values.&lt;/P&gt;</description>
      <pubDate>Fri, 01 Mar 2024 20:18:56 GMT</pubDate>
      <guid>https://communities.sas.com/t5/New-SAS-User/Claims-level-data-working-with-duplicates-to-classify-disease/m-p/918649#M41124</guid>
      <dc:creator>ballardw</dc:creator>
      <dc:date>2024-03-01T20:18:56Z</dc:date>
    </item>
    <item>
      <title>Re: Claims level data-working with duplicates to classify disease</title>
      <link>https://communities.sas.com/t5/New-SAS-User/Claims-level-data-working-with-duplicates-to-classify-disease/m-p/918694#M41125</link>
      <description>&lt;P&gt;Thank you. The latter code was helpful.&lt;/P&gt;</description>
      <pubDate>Sat, 02 Mar 2024 19:02:24 GMT</pubDate>
      <guid>https://communities.sas.com/t5/New-SAS-User/Claims-level-data-working-with-duplicates-to-classify-disease/m-p/918694#M41125</guid>
      <dc:creator>Dissertator</dc:creator>
      <dc:date>2024-03-02T19:02:24Z</dc:date>
    </item>
  </channel>
</rss>

