I have a dataset similar to the one below. I have shortened the example, but I have a column for every month and year from 2006-2013. Each column represents whether a person has health insurance (1=Yes; 0=No) or not during that specific month and year. I have another date variable indicating when a person was diagnosed with condition X. I need to create a new indicator variable to identify people into 1 of 3 categories: 1) those with health insurance within the 12 months prior to their diagnosis; 2) those who only got health insurance in the month of or 12 months after their diagnosis; 3) those without health insurance at all. I know there is likely a way to more efficiently complete this rather than a ton of IF/THEN statements, but I am unsure of how to best proceed.
Any help would be greatly appreciated
ID | Jan_2006 | Jan_2007 | Jan_2008 | Feb_2006 | Feb_2007 | Feb_2008 | Mar_2006 | Mar_2007 | Mar_2008 | Apr_2006 | Apr_2007 | Apr_2008 | Diagnosed |
589765 | 1 | 1 | 1 | 1 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 02/01/2007 |
235698 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 03/01/2006 |
784956 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 01/01/2008 |
Probably a simpler way than this but this can get you started at least.
Your variables are not in order in the data set as shown, so if you want to use this format, that would have to change. Rather than figure that out, a long format is simpler IMO.
data have;
infile cards dlm='09'x truncover;
input ID $ Jan_2006 Jan_2007 Jan_2008 Feb_2006 Feb_2007 Feb_2008 Mar_2006 Mar_2007 Mar_2008 Apr_2006 Apr_2007 Apr_2008 Diagnosed :mmddyy10.;
format diagnosed date9.;
cards;
589765 1 1 1 1 1 0 1 0 0 1 0 0 02/01/2007
235698 0 0 1 0 0 1 0 0 1 0 0 1 03/01/2006
784956 0 0 0 0 0 0 0 0 0 0 0 0 01/01/2008
;;;;
run;
*sort for transpose;
proc sort data=have;
by id diagnosed;
run;
*transpose to long format;
proc transpose data=have out=long;
by id diagnosed;
var jan_: feb_: mar_: apr_:;
run;
*convert variable names to dates;
data long2;
set long;
insured_month = input(compress(_name_, "_"), monyy7.);
format insured_month monyy7.;
rename col1=insured_status;
drop _name_;
run;
proc sort data=long2;
by id insured_month;
run;
data status;
set long2;
by id;
*create temporary array to store last 12 month;
array prev_status(0:11) _temporary_ ;
retain never_insured insured_12m_prior_diag insured_12m_after_diag counter;
*set all values to missing at first of each id;
if first.id then call missing(of prev_status(*), never_insured, insured_12m_prior_diag, insured_12m_after_diag, counter);
*check if month is same as diagnosed month for start of comparing;
if intnx('month', diagnosed, 0, 'b') = insured_month then do;
*check previous 12 month values to set previous 12 month flag;
if sum(of prev_status(*))>0 then insured_12m_prior_diag=1;
*set counter to check next 12 months;
else counter=0;
end;
*increment counter for each month;
counter = counter+1;
*sets flag if insured after 12 month and no insurance ever historically;
if 1<= counter <= 12 and insured_status=1 and never_insured=. then insured_12m_after_diag=1;
*if insured, set never_insured to 0 so that history of insurance can be checked in case of someone having insurance, cancelling
and then restarting insurance before 12 month period;
if insured_status=1 then never_insured=0;
*assigns status to moving array;
prev_status{mod(_n_,12)} = insured_status;
*sets flags correct;
if last.id then do;
never_insured = (never_insured=.);
insured_12m_after_diag = 1-(insured_12m_after_diag=.);
insured_12m_prior_diag = 1- (insured_12m_prior_diag=.);
output;
end;
*remove extra variable;
drop counter;
run;
@gabe434555 wrote:
I have a dataset similar to the one below. I have shortened the example, but I have a column for every month and year from 2006-2013. Each column represents whether a person has health insurance (1=Yes; 0=No) or not during that specific month and year. I have another date variable indicating when a person was diagnosed with condition X. I need to create a new indicator variable to identify people into 1 of 3 categories: 1) those with health insurance within the 12 months prior to their diagnosis; 2) those who only got health insurance in the month of or 12 months after their diagnosis; 3) those without health insurance at all. I know there is likely a way to more efficiently complete this rather than a ton of IF/THEN statements, but I am unsure of how to best proceed.
Any help would be greatly appreciated
ID Jan_2006 Jan_2007 Jan_2008 Feb_2006 Feb_2007 Feb_2008 Mar_2006 Mar_2007 Mar_2008
Apr_2006 Apr_2007
Apr_2008
Diagnosed
589765 1 1 1 1 1 0 1 0 0 1 0 0 02/01/2007
235698 0 0 1 0 0 1 0 0 1 0 0 1 03/01/2006 784956 0 0 0 0 0 0 0 0 0 0 0 0 01/01/2008
Probably a simpler way than this but this can get you started at least.
Your variables are not in order in the data set as shown, so if you want to use this format, that would have to change. Rather than figure that out, a long format is simpler IMO.
data have;
infile cards dlm='09'x truncover;
input ID $ Jan_2006 Jan_2007 Jan_2008 Feb_2006 Feb_2007 Feb_2008 Mar_2006 Mar_2007 Mar_2008 Apr_2006 Apr_2007 Apr_2008 Diagnosed :mmddyy10.;
format diagnosed date9.;
cards;
589765 1 1 1 1 1 0 1 0 0 1 0 0 02/01/2007
235698 0 0 1 0 0 1 0 0 1 0 0 1 03/01/2006
784956 0 0 0 0 0 0 0 0 0 0 0 0 01/01/2008
;;;;
run;
*sort for transpose;
proc sort data=have;
by id diagnosed;
run;
*transpose to long format;
proc transpose data=have out=long;
by id diagnosed;
var jan_: feb_: mar_: apr_:;
run;
*convert variable names to dates;
data long2;
set long;
insured_month = input(compress(_name_, "_"), monyy7.);
format insured_month monyy7.;
rename col1=insured_status;
drop _name_;
run;
proc sort data=long2;
by id insured_month;
run;
data status;
set long2;
by id;
*create temporary array to store last 12 month;
array prev_status(0:11) _temporary_ ;
retain never_insured insured_12m_prior_diag insured_12m_after_diag counter;
*set all values to missing at first of each id;
if first.id then call missing(of prev_status(*), never_insured, insured_12m_prior_diag, insured_12m_after_diag, counter);
*check if month is same as diagnosed month for start of comparing;
if intnx('month', diagnosed, 0, 'b') = insured_month then do;
*check previous 12 month values to set previous 12 month flag;
if sum(of prev_status(*))>0 then insured_12m_prior_diag=1;
*set counter to check next 12 months;
else counter=0;
end;
*increment counter for each month;
counter = counter+1;
*sets flag if insured after 12 month and no insurance ever historically;
if 1<= counter <= 12 and insured_status=1 and never_insured=. then insured_12m_after_diag=1;
*if insured, set never_insured to 0 so that history of insurance can be checked in case of someone having insurance, cancelling
and then restarting insurance before 12 month period;
if insured_status=1 then never_insured=0;
*assigns status to moving array;
prev_status{mod(_n_,12)} = insured_status;
*sets flags correct;
if last.id then do;
never_insured = (never_insured=.);
insured_12m_after_diag = 1-(insured_12m_after_diag=.);
insured_12m_prior_diag = 1- (insured_12m_prior_diag=.);
output;
end;
*remove extra variable;
drop counter;
run;
@gabe434555 wrote:
I have a dataset similar to the one below. I have shortened the example, but I have a column for every month and year from 2006-2013. Each column represents whether a person has health insurance (1=Yes; 0=No) or not during that specific month and year. I have another date variable indicating when a person was diagnosed with condition X. I need to create a new indicator variable to identify people into 1 of 3 categories: 1) those with health insurance within the 12 months prior to their diagnosis; 2) those who only got health insurance in the month of or 12 months after their diagnosis; 3) those without health insurance at all. I know there is likely a way to more efficiently complete this rather than a ton of IF/THEN statements, but I am unsure of how to best proceed.
Any help would be greatly appreciated
ID Jan_2006 Jan_2007 Jan_2008 Feb_2006 Feb_2007 Feb_2008 Mar_2006 Mar_2007 Mar_2008
Apr_2006 Apr_2007
Apr_2008
Diagnosed
589765 1 1 1 1 1 0 1 0 0 1 0 0 02/01/2007
235698 0 0 1 0 0 1 0 0 1 0 0 1 03/01/2006 784956 0 0 0 0 0 0 0 0 0 0 0 0 01/01/2008
Stealing @Reeza 's data creation code, I agree long format makes more sense to me.
Instead of using arrays for detecting insurance status after transposing to long format, I think you could just used some retained boolean flags, e.g.:
data status (keep=ID Diagnosed InsuredPrior InsuredAfter Category);
set long2;
by id;
retain InsuredPrior InsuredAfter ;
if first.id then do ;
call missing (InsuredPrior,InsuredAfter);
end ;
if insured_status=1 and -12 <= intck('month',insured_month,diagnosed) <= -1 then InsuredPrior=1 ;
if insured_status=1 and 0 <= intck('month',insured_month,diagnosed) <= 12 then InsuredAfter=1 ;
if last.id ;
if InsuredPrior then Category=1 ;
else if InsuredAfter then Category=2 ;
else Category=3 ;
run;
Note that category=3 in this case is not 'never insured', it's really 'not insured within 12 months of diagnosis'.
@Quentin Using the dates instead of tracking/positional is definitely a better approach 🙂
If you wanted to keep your data in the format, this is one approach but you need a full data set for testing and you've only provided a partial data set.
Tweaks may be required.
proc contents data=have out=name_list;
run;
data name_list_clean;
set name_list;
where anydigit(name) >0;
month = input(compress(name, "_"), monyy7.);
format month monyy7.;
keep name month;
run;
proc sort data=name_list_clean;
by month;
run;
proc sql noprint;
select catt(put(month, monname3.), "_", year(month)) into :month_list separated by " "
from name_list_clean
order by month;
quit;
%put &month_list;
data want;
set have;
array _insured_status(*) &month_list;
*initialize values to 0;
never_insured = 0; insured_12m_prior_diag=0; insured_12m_after_diag =0; prev_insured=0;
*check if ever insured;
if sum(of _insured_status(*)) =0 then never_insured=0;
*if insured at some point further checks;
else do;
index_month = intck('month', '01Jan2006'd, diagnosed);
start_index = max(0, index_month - 13);
end_index = min(hbound(_insured_status), index_month + 11);
*check 12 months (or available) for prior insurance;
*what do you want to do if not 12 months available;
do i=start_index to index_month - 1;
if _insured_status(i) = 1 then insured_12m_prior_diag =1;
end;
*check no previous insurance ever;
do i=1 to index_month while(prev_insured=0);
if _insured_status(i) = 1 then prev_insured=1;
end;
*if no insurance ever, check 12 months after for first insurance;
if prev_insured = 0 then do i=index_month to end_index;
if _insured_status(i) = 1 then insured_12m_after_diag = 1;
end;
end;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.