BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
gabe434555
Fluorite | Level 6

I have a dataset similar to the one below. I have shortened the example, but I have a column for every month and year from 2006-2013. Each column represents whether a person has health insurance (1=Yes; 0=No) or not during that specific month and year. I have another date variable indicating when a person was diagnosed with condition X. I need to create a new indicator variable to identify people into 1 of 3 categories: 1) those with health insurance within the 12 months prior to their diagnosis; 2) those who only got health insurance in the month of or 12 months after their diagnosis; 3) those without health insurance at all. I know there is likely a way to more efficiently complete this rather than a ton of IF/THEN statements, but I am unsure of how to best proceed.

 

Any help would be greatly appreciated

 

IDJan_2006Jan_2007Jan_2008Feb_2006Feb_2007Feb_2008Mar_2006Mar_2007

Mar_2008

Apr_2006

Apr_2007

Apr_2008

Diagnosed

589765111110100100

02/01/2007

23569800100100100103/01/2006
78495600000000000001/01/2008
1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User

Probably a simpler way than this but this can get you started at least. 

Your variables are not in order in the data set as shown, so if you want to use this format, that would have to change. Rather than figure that out, a long format is simpler IMO. 

 

data have;
infile cards dlm='09'x truncover;

input ID $	Jan_2006	Jan_2007	Jan_2008	Feb_2006	Feb_2007	Feb_2008	Mar_2006	Mar_2007	Mar_2008 Apr_2006 Apr_2007 Apr_2008 Diagnosed :mmddyy10.;
format diagnosed date9.;
cards;
589765	1	1	1	1	1	0	1	0	0	1	0	0	02/01/2007
235698	0	0	1	0	0	1	0	0	1	0	0	1	03/01/2006
784956	0	0	0	0	0	0	0	0	0	0	0	0	01/01/2008
;;;;
run;

*sort for transpose;
proc sort data=have;
by id diagnosed;
run;

*transpose to long format;
proc transpose data=have out=long;
by id diagnosed;
var jan_: feb_: mar_: apr_:;
run;

*convert variable names to dates;
data long2;
set long;
insured_month = input(compress(_name_, "_"), monyy7.);
format insured_month monyy7.;
rename col1=insured_status; 
drop _name_;
run;

proc sort data=long2;
by id insured_month;
run;

data status;

set long2;
by id;
*create temporary array to store last 12 month;
array prev_status(0:11) _temporary_ ;
retain never_insured insured_12m_prior_diag insured_12m_after_diag counter;

*set all values to missing at first of each id;
if first.id then call missing(of prev_status(*), never_insured, insured_12m_prior_diag, insured_12m_after_diag, counter);

*check if month is same as diagnosed month for start of comparing;
if intnx('month', diagnosed, 0, 'b') = insured_month then do;
     *check previous 12 month values to set previous 12 month flag;
	if sum(of prev_status(*))>0 then insured_12m_prior_diag=1;
	*set counter to check next 12 months;
	else counter=0;
end;

*increment counter for each month;
counter = counter+1;

*sets flag if insured after 12 month and no insurance ever historically;
if  1<= counter <= 12 and insured_status=1 and never_insured=. then insured_12m_after_diag=1;

*if insured, set never_insured to 0 so that history of insurance can be checked in case of someone having insurance, cancelling
and then restarting insurance before 12 month period;
if insured_status=1 then never_insured=0;

*assigns status to moving array;
prev_status{mod(_n_,12)} = insured_status;

*sets flags correct;
if last.id then do; 
never_insured = (never_insured=.);
insured_12m_after_diag = 1-(insured_12m_after_diag=.);
insured_12m_prior_diag = 1- (insured_12m_prior_diag=.);
output;
end;
*remove extra variable;
drop counter;
run;
Spoiler

@gabe434555 wrote:

I have a dataset similar to the one below. I have shortened the example, but I have a column for every month and year from 2006-2013. Each column represents whether a person has health insurance (1=Yes; 0=No) or not during that specific month and year. I have another date variable indicating when a person was diagnosed with condition X. I need to create a new indicator variable to identify people into 1 of 3 categories: 1) those with health insurance within the 12 months prior to their diagnosis; 2) those who only got health insurance in the month of or 12 months after their diagnosis; 3) those without health insurance at all. I know there is likely a way to more efficiently complete this rather than a ton of IF/THEN statements, but I am unsure of how to best proceed.

 

Any help would be greatly appreciated

 

ID Jan_2006 Jan_2007 Jan_2008 Feb_2006 Feb_2007 Feb_2008 Mar_2006 Mar_2007

Mar_2008

Apr_2006

Apr_2007

Apr_2008

Diagnosed

589765 1 1 1 1 1 0 1 0 0 1 0 0

02/01/2007

235698 0 0 1 0 0 1 0 0 1 0 0 1 03/01/2006
784956 0 0 0 0 0 0 0 0 0 0 0 0 01/01/2008

View solution in original post

5 REPLIES 5
Reeza
Super User

Probably a simpler way than this but this can get you started at least. 

Your variables are not in order in the data set as shown, so if you want to use this format, that would have to change. Rather than figure that out, a long format is simpler IMO. 

 

data have;
infile cards dlm='09'x truncover;

input ID $	Jan_2006	Jan_2007	Jan_2008	Feb_2006	Feb_2007	Feb_2008	Mar_2006	Mar_2007	Mar_2008 Apr_2006 Apr_2007 Apr_2008 Diagnosed :mmddyy10.;
format diagnosed date9.;
cards;
589765	1	1	1	1	1	0	1	0	0	1	0	0	02/01/2007
235698	0	0	1	0	0	1	0	0	1	0	0	1	03/01/2006
784956	0	0	0	0	0	0	0	0	0	0	0	0	01/01/2008
;;;;
run;

*sort for transpose;
proc sort data=have;
by id diagnosed;
run;

*transpose to long format;
proc transpose data=have out=long;
by id diagnosed;
var jan_: feb_: mar_: apr_:;
run;

*convert variable names to dates;
data long2;
set long;
insured_month = input(compress(_name_, "_"), monyy7.);
format insured_month monyy7.;
rename col1=insured_status; 
drop _name_;
run;

proc sort data=long2;
by id insured_month;
run;

data status;

set long2;
by id;
*create temporary array to store last 12 month;
array prev_status(0:11) _temporary_ ;
retain never_insured insured_12m_prior_diag insured_12m_after_diag counter;

*set all values to missing at first of each id;
if first.id then call missing(of prev_status(*), never_insured, insured_12m_prior_diag, insured_12m_after_diag, counter);

*check if month is same as diagnosed month for start of comparing;
if intnx('month', diagnosed, 0, 'b') = insured_month then do;
     *check previous 12 month values to set previous 12 month flag;
	if sum(of prev_status(*))>0 then insured_12m_prior_diag=1;
	*set counter to check next 12 months;
	else counter=0;
end;

*increment counter for each month;
counter = counter+1;

*sets flag if insured after 12 month and no insurance ever historically;
if  1<= counter <= 12 and insured_status=1 and never_insured=. then insured_12m_after_diag=1;

*if insured, set never_insured to 0 so that history of insurance can be checked in case of someone having insurance, cancelling
and then restarting insurance before 12 month period;
if insured_status=1 then never_insured=0;

*assigns status to moving array;
prev_status{mod(_n_,12)} = insured_status;

*sets flags correct;
if last.id then do; 
never_insured = (never_insured=.);
insured_12m_after_diag = 1-(insured_12m_after_diag=.);
insured_12m_prior_diag = 1- (insured_12m_prior_diag=.);
output;
end;
*remove extra variable;
drop counter;
run;
Spoiler

@gabe434555 wrote:

I have a dataset similar to the one below. I have shortened the example, but I have a column for every month and year from 2006-2013. Each column represents whether a person has health insurance (1=Yes; 0=No) or not during that specific month and year. I have another date variable indicating when a person was diagnosed with condition X. I need to create a new indicator variable to identify people into 1 of 3 categories: 1) those with health insurance within the 12 months prior to their diagnosis; 2) those who only got health insurance in the month of or 12 months after their diagnosis; 3) those without health insurance at all. I know there is likely a way to more efficiently complete this rather than a ton of IF/THEN statements, but I am unsure of how to best proceed.

 

Any help would be greatly appreciated

 

ID Jan_2006 Jan_2007 Jan_2008 Feb_2006 Feb_2007 Feb_2008 Mar_2006 Mar_2007

Mar_2008

Apr_2006

Apr_2007

Apr_2008

Diagnosed

589765 1 1 1 1 1 0 1 0 0 1 0 0

02/01/2007

235698 0 0 1 0 0 1 0 0 1 0 0 1 03/01/2006
784956 0 0 0 0 0 0 0 0 0 0 0 0 01/01/2008

Quentin
Super User

Stealing @Reeza 's data creation code, I agree long format makes more sense to me.

 

Instead of using arrays for detecting insurance status after transposing to long format, I think you could just used some retained boolean flags, e.g.:

 

data status (keep=ID Diagnosed InsuredPrior InsuredAfter Category);

  set long2;
  by id;

  retain InsuredPrior InsuredAfter ;
  if first.id then do ;
    call missing (InsuredPrior,InsuredAfter);
  end ;

  if insured_status=1 and -12 <= intck('month',insured_month,diagnosed) <= -1 then InsuredPrior=1 ;
  if insured_status=1 and  0  <= intck('month',insured_month,diagnosed) <= 12 then InsuredAfter=1 ;

  if last.id ;

  if      InsuredPrior then Category=1 ;
  else if InsuredAfter then Category=2 ; 
  else                      Category=3 ; 

run;

Note that category=3 in this case is not 'never insured', it's really 'not insured within 12 months of diagnosis'.

BASUG is hosting free webinars Next up: Don Henderson presenting on using hash functions (not hash tables!) to segment data on June 12. Register now at the Boston Area SAS Users Group event page: https://www.basug.org/events.
Reeza
Super User

@Quentin Using the dates instead of tracking/positional is definitely a better approach 🙂

gabe434555
Fluorite | Level 6
Thank you so much! This worked great! I double-checked a sample of them manually and the indicators are correct!
Reeza
Super User

If you wanted to keep your data in the format, this is one approach but you need a full data set for testing and you've only provided a partial data set.

Tweaks may be required.

 

proc contents data=have out=name_list;
run;

data name_list_clean;
set name_list;
where anydigit(name) >0;
month = input(compress(name, "_"), monyy7.);
format month monyy7.;
keep name month;
run;

proc sort data=name_list_clean;
by month;
run;

proc sql noprint;
select catt(put(month, monname3.), "_", year(month)) into :month_list separated by " "
from name_list_clean
order by month;
quit;

%put &month_list;

data want;
set have;

array _insured_status(*) &month_list;

*initialize values to 0;
never_insured = 0; insured_12m_prior_diag=0; insured_12m_after_diag =0; prev_insured=0;

*check if ever insured;
if sum(of _insured_status(*)) =0 then never_insured=0;
*if insured at some point further checks;
else do;
    index_month = intck('month', '01Jan2006'd, diagnosed);
    start_index = max(0, index_month - 13);
    end_index = min(hbound(_insured_status), index_month + 11);
    
    *check 12 months (or available) for prior insurance;
    *what do you want to do if not 12 months available;
    do i=start_index to index_month - 1;
      if _insured_status(i) = 1 then insured_12m_prior_diag =1;
    end;
    
    *check no previous insurance ever;
    do i=1 to index_month while(prev_insured=0);
        if _insured_status(i) = 1 then prev_insured=1;
    end;
    
    *if no insurance ever, check 12 months after for first insurance;
    if prev_insured = 0 then do i=index_month to end_index;
        if _insured_status(i) = 1 then insured_12m_after_diag = 1;
    end;
    
end;

run;

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 545 views
  • 3 likes
  • 3 in conversation