Solved: Need to create a variable based on meeting conditions of several varia...

gabe434555 · Posted 10-06-2022 12:46 PM

I have a dataset similar to the one below. I have shortened the example, but I have a column for every month and year from 2006-2013. Each column represents whether a person has health insurance (1=Yes; 0=No) or not during that specific month and year. I have another date variable indicating when a person was diagnosed with condition X. I need to create a new indicator variable to identify people into 1 of 3 categories: 1) those with health insurance within the 12 months prior to their diagnosis; 2) those who only got health insurance in the month of or 12 months after their diagnosis; 3) those without health insurance at all. I know there is likely a way to more efficiently complete this rather than a ton of IF/THEN statements, but I am unsure of how to best proceed.

Any help would be greatly appreciated

ID	Jan_2006	Jan_2007	Jan_2008	Feb_2006	Feb_2007	Feb_2008	Mar_2006	Mar_2007	Mar_2008	Apr_2006	Apr_2007	Apr_2008	Diagnosed
589765	1	1	1	1	1	0	1	0	0	1	0	0	02/01/2007
235698	0	0	1	0	0	1	0	0	1	0	0	1	03/01/2006
784956	0	0	0	0	0	0	0	0	0	0	0	0	01/01/2008

Reeza · Posted 10-06-2022 03:15 PM

Probably a simpler way than this but this can get you started at least.

Your variables are not in order in the data set as shown, so if you want to use this format, that would have to change. Rather than figure that out, a long format is simpler IMO.

data have;
infile cards dlm='09'x truncover;

input ID $	Jan_2006	Jan_2007	Jan_2008	Feb_2006	Feb_2007	Feb_2008	Mar_2006	Mar_2007	Mar_2008 Apr_2006 Apr_2007 Apr_2008 Diagnosed :mmddyy10.;
format diagnosed date9.;
cards;
589765	1	1	1	1	1	0	1	0	0	1	0	0	02/01/2007
235698	0	0	1	0	0	1	0	0	1	0	0	1	03/01/2006
784956	0	0	0	0	0	0	0	0	0	0	0	0	01/01/2008
;;;;
run;

*sort for transpose;
proc sort data=have;
by id diagnosed;
run;

*transpose to long format;
proc transpose data=have out=long;
by id diagnosed;
var jan_: feb_: mar_: apr_:;
run;

*convert variable names to dates;
data long2;
set long;
insured_month = input(compress(_name_, "_"), monyy7.);
format insured_month monyy7.;
rename col1=insured_status; 
drop _name_;
run;

proc sort data=long2;
by id insured_month;
run;

data status;

set long2;
by id;
*create temporary array to store last 12 month;
array prev_status(0:11) _temporary_ ;
retain never_insured insured_12m_prior_diag insured_12m_after_diag counter;

*set all values to missing at first of each id;
if first.id then call missing(of prev_status(*), never_insured, insured_12m_prior_diag, insured_12m_after_diag, counter);

*check if month is same as diagnosed month for start of comparing;
if intnx('month', diagnosed, 0, 'b') = insured_month then do;
     *check previous 12 month values to set previous 12 month flag;
	if sum(of prev_status(*))>0 then insured_12m_prior_diag=1;
	*set counter to check next 12 months;
	else counter=0;
end;

*increment counter for each month;
counter = counter+1;

*sets flag if insured after 12 month and no insurance ever historically;
if  1<= counter <= 12 and insured_status=1 and never_insured=. then insured_12m_after_diag=1;

*if insured, set never_insured to 0 so that history of insurance can be checked in case of someone having insurance, cancelling
and then restarting insurance before 12 month period;
if insured_status=1 then never_insured=0;

*assigns status to moving array;
prev_status{mod(_n_,12)} = insured_status;

*sets flags correct;
if last.id then do; 
never_insured = (never_insured=.);
insured_12m_after_diag = 1-(insured_12m_after_diag=.);
insured_12m_prior_diag = 1- (insured_12m_prior_diag=.);
output;
end;
*remove extra variable;
drop counter;
run;

Spoiler

@gabe434555 wrote:

I have a dataset similar to the one below. I have shortened the example, but I have a column for every month and year from 2006-2013. Each column represents whether a person has health insurance (1=Yes; 0=No) or not during that specific month and year. I have another date variable indicating when a person was diagnosed with condition X. I need to create a new indicator variable to identify people into 1 of 3 categories: 1) those with health insurance within the 12 months prior to their diagnosis; 2) those who only got health insurance in the month of or 12 months after their diagnosis; 3) those without health insurance at all. I know there is likely a way to more efficiently complete this rather than a ton of IF/THEN statements, but I am unsure of how to best proceed.

Any help would be greatly appreciated

ID Jan_2006 Jan_2007 Jan_2008 Feb_2006 Feb_2007 Feb_2008 Mar_2006 Mar_2007
Mar_2008
Apr_2006
Apr_2007

Apr_2008

Diagnosed

589765 1 1 1 1 1 0 1 0 0 1 0 0
02/01/2007

235698 0 0 1 0 0 1 0 0 1 0 0 1 03/01/2006

784956 0 0 0 0 0 0 0 0 0 0 0 0 01/01/2008

wrote: I have a dataset similar to the one below. I have shortened the example, but I have a column for every month and year from 2006-2013. Each column represents whether a person has health insurance (1=Yes; 0=No) or not during that specific month and year. I have another date variable indicating when a person was diagnosed with condition X. I need to create a new indicator variable to identify people into 1 of 3 categories: 1) those with health insurance within the 12 months prior to their diagnosis; 2) those who only got health insurance in the month of or 12 months after their diagnosis; 3) those without health insurance at all. I know there is likely a way to more efficiently complete this rather than a ton of IF/THEN statements, but I am unsure of how to best proceed. Any help would be greatly appreciated ID Jan_2006 Jan_2007 Jan_2008 Feb_2006 Feb_2007 Feb_2008 Mar_2006 Mar_2007 Mar_2008 Apr_2006 Apr_2007 Apr_2008 Diagnosed 589765 1 1 1 1 1 0 1 0 0 1 0 0 02/01/2007 235698 0 0 1 0 0 1 0 0 1 0 0 1 03/01/2006 784956 0 0 0 0 0 0 0 0 0 0 0 0 01/01/2008

View solution in original post

Reeza · Posted 10-06-2022 03:15 PM

Probably a simpler way than this but this can get you started at least.

Your variables are not in order in the data set as shown, so if you want to use this format, that would have to change. Rather than figure that out, a long format is simpler IMO.

data have;
infile cards dlm='09'x truncover;

input ID $	Jan_2006	Jan_2007	Jan_2008	Feb_2006	Feb_2007	Feb_2008	Mar_2006	Mar_2007	Mar_2008 Apr_2006 Apr_2007 Apr_2008 Diagnosed :mmddyy10.;
format diagnosed date9.;
cards;
589765	1	1	1	1	1	0	1	0	0	1	0	0	02/01/2007
235698	0	0	1	0	0	1	0	0	1	0	0	1	03/01/2006
784956	0	0	0	0	0	0	0	0	0	0	0	0	01/01/2008
;;;;
run;

*sort for transpose;
proc sort data=have;
by id diagnosed;
run;

*transpose to long format;
proc transpose data=have out=long;
by id diagnosed;
var jan_: feb_: mar_: apr_:;
run;

*convert variable names to dates;
data long2;
set long;
insured_month = input(compress(_name_, "_"), monyy7.);
format insured_month monyy7.;
rename col1=insured_status; 
drop _name_;
run;

proc sort data=long2;
by id insured_month;
run;

data status;

set long2;
by id;
*create temporary array to store last 12 month;
array prev_status(0:11) _temporary_ ;
retain never_insured insured_12m_prior_diag insured_12m_after_diag counter;

*set all values to missing at first of each id;
if first.id then call missing(of prev_status(*), never_insured, insured_12m_prior_diag, insured_12m_after_diag, counter);

*check if month is same as diagnosed month for start of comparing;
if intnx('month', diagnosed, 0, 'b') = insured_month then do;
     *check previous 12 month values to set previous 12 month flag;
	if sum(of prev_status(*))>0 then insured_12m_prior_diag=1;
	*set counter to check next 12 months;
	else counter=0;
end;

*increment counter for each month;
counter = counter+1;

*sets flag if insured after 12 month and no insurance ever historically;
if  1<= counter <= 12 and insured_status=1 and never_insured=. then insured_12m_after_diag=1;

*if insured, set never_insured to 0 so that history of insurance can be checked in case of someone having insurance, cancelling
and then restarting insurance before 12 month period;
if insured_status=1 then never_insured=0;

*assigns status to moving array;
prev_status{mod(_n_,12)} = insured_status;

*sets flags correct;
if last.id then do; 
never_insured = (never_insured=.);
insured_12m_after_diag = 1-(insured_12m_after_diag=.);
insured_12m_prior_diag = 1- (insured_12m_prior_diag=.);
output;
end;
*remove extra variable;
drop counter;
run;

Spoiler

@gabe434555 wrote:

I have a dataset similar to the one below. I have shortened the example, but I have a column for every month and year from 2006-2013. Each column represents whether a person has health insurance (1=Yes; 0=No) or not during that specific month and year. I have another date variable indicating when a person was diagnosed with condition X. I need to create a new indicator variable to identify people into 1 of 3 categories: 1) those with health insurance within the 12 months prior to their diagnosis; 2) those who only got health insurance in the month of or 12 months after their diagnosis; 3) those without health insurance at all. I know there is likely a way to more efficiently complete this rather than a ton of IF/THEN statements, but I am unsure of how to best proceed.

Any help would be greatly appreciated

ID Jan_2006 Jan_2007 Jan_2008 Feb_2006 Feb_2007 Feb_2008 Mar_2006 Mar_2007
Mar_2008
Apr_2006
Apr_2007

Apr_2008

Diagnosed

589765 1 1 1 1 1 0 1 0 0 1 0 0
02/01/2007

235698 0 0 1 0 0 1 0 0 1 0 0 1 03/01/2006

784956 0 0 0 0 0 0 0 0 0 0 0 0 01/01/2008

wrote: I have a dataset similar to the one below. I have shortened the example, but I have a column for every month and year from 2006-2013. Each column represents whether a person has health insurance (1=Yes; 0=No) or not during that specific month and year. I have another date variable indicating when a person was diagnosed with condition X. I need to create a new indicator variable to identify people into 1 of 3 categories: 1) those with health insurance within the 12 months prior to their diagnosis; 2) those who only got health insurance in the month of or 12 months after their diagnosis; 3) those without health insurance at all. I know there is likely a way to more efficiently complete this rather than a ton of IF/THEN statements, but I am unsure of how to best proceed. Any help would be greatly appreciated ID Jan_2006 Jan_2007 Jan_2008 Feb_2006 Feb_2007 Feb_2008 Mar_2006 Mar_2007 Mar_2008 Apr_2006 Apr_2007 Apr_2008 Diagnosed 589765 1 1 1 1 1 0 1 0 0 1 0 0 02/01/2007 235698 0 0 1 0 0 1 0 0 1 0 0 1 03/01/2006 784956 0 0 0 0 0 0 0 0 0 0 0 0 01/01/2008

Quentin · Posted 10-06-2022 04:14 PM

Stealing @Reeza 's data creation code, I agree long format makes more sense to me.

Instead of using arrays for detecting insurance status after transposing to long format, I think you could just used some retained boolean flags, e.g.:

data status (keep=ID Diagnosed InsuredPrior InsuredAfter Category);

  set long2;
  by id;

  retain InsuredPrior InsuredAfter ;
  if first.id then do ;
    call missing (InsuredPrior,InsuredAfter);
  end ;

  if insured_status=1 and -12 <= intck('month',insured_month,diagnosed) <= -1 then InsuredPrior=1 ;
  if insured_status=1 and  0  <= intck('month',insured_month,diagnosed) <= 12 then InsuredAfter=1 ;

  if last.id ;

  if      InsuredPrior then Category=1 ;
  else if InsuredAfter then Category=2 ; 
  else                      Category=3 ; 

run;

Note that category=3 in this case is not 'never insured', it's really 'not insured within 12 months of diagnosis'.

BASUG is hosting free webinars Next up: Don Henderson presenting on using hash functions (not hash tables!) to segment data on June 12. Register now at the Boston Area SAS Users Group event page: https://www.basug.org/events.

Reeza · Posted 10-06-2022 04:34 PM

@Quentin Using the dates instead of tracking/positional is definitely a better approach 🙂

gabe434555 · Posted 10-06-2022 09:22 PM

Thank you so much! This worked great! I double-checked a sample of them manually and the indicators are correct!

Reeza · Posted 10-06-2022 03:39 PM

If you wanted to keep your data in the format, this is one approach but you need a full data set for testing and you've only provided a partial data set.

Tweaks may be required.

proc contents data=have out=name_list;
run;

data name_list_clean;
set name_list;
where anydigit(name) >0;
month = input(compress(name, "_"), monyy7.);
format month monyy7.;
keep name month;
run;

proc sort data=name_list_clean;
by month;
run;

proc sql noprint;
select catt(put(month, monname3.), "_", year(month)) into :month_list separated by " "
from name_list_clean
order by month;
quit;

%put &month_list;

data want;
set have;

array _insured_status(*) &month_list;

*initialize values to 0;
never_insured = 0; insured_12m_prior_diag=0; insured_12m_after_diag =0; prev_insured=0;

*check if ever insured;
if sum(of _insured_status(*)) =0 then never_insured=0;
*if insured at some point further checks;
else do;
    index_month = intck('month', '01Jan2006'd, diagnosed);
    start_index = max(0, index_month - 13);
    end_index = min(hbound(_insured_status), index_month + 11);
    
    *check 12 months (or available) for prior insurance;
    *what do you want to do if not 12 months available;
    do i=start_index to index_month - 1;
      if _insured_status(i) = 1 then insured_12m_prior_diag =1;
    end;
    
    *check no previous insurance ever;
    do i=1 to index_month while(prev_insured=0);
        if _insured_status(i) = 1 then prev_insured=1;
    end;
    
    *if no insurance ever, check 12 months after for first insurance;
    if prev_insured = 0 then do i=index_month to end_index;
        if _insured_status(i) = 1 then insured_12m_after_diag = 1;
    end;
    
end;

run;

Need to create a variable based on meeting conditions of several variables

Re: Need to create a variable based on meeting conditions of several variables

Re: Need to create a variable based on meeting conditions of several variables

Re: Need to create a variable based on meeting conditions of several variables

Re: Need to create a variable based on meeting conditions of several variables

Re: Need to create a variable based on meeting conditions of several variables

Re: Need to create a variable based on meeting conditions of several variables

Classroom Training Available!