Solved: Re: How do I read or extract variables of specified time period to cre...

hkim30 · Posted 08-26-2017 12:19 PM

Hello, I have this question I am struggling with for weeks, and I would appreciate so much if I could get your advice and help on this to move forward. Thanks so much for your time and attention in advance!

A bit of context
I am working on the Medicare Claims data for the calendar year 2012-2014. There is a file called Medicare Beneficiary Summary File every year, which is at an individual level and contains basic information about the individual beneficiary, such as sex, age, date of birth and death, Medicare Part A and B enrollment status, etc. My project sample is the group of Medicare benes who died in the period of 2013 and 2014.

Task
My task is to create a “dual eligibility status” indicator that shows their dual eligibility status for the 12 months prior to their deaths.There are twelve monthly dual status code variables per individual and if the value is a certain value, I consider them dual-eligibles. Now the dataset I created looks like this:

id death_dt ref_yr Dual_01 Dual_02 Dual_03 … Dual_12
ABCDEa 29MAR2014 2012 02 02 02 02
ABCDEa 29MAR2014 2013 02 02 02 02
ABCDEa 29MAR2014 2014 02 00 00 NA
ABCDEb 10SEP2013 2012 03 03 06 06
ABCDEb 10SEP2013 2013 06 06 06 00

Stuggles and question

For now, as shown above, I have several rows per person depending on their date of death. However, I plan to use only 12 months of data prior to the bene's death year and month to create an indicator of dual eligible status. How can I create a dataset that has only one row per person, id, death date, reference year, and 12 months prior to their death year and month? (e.g., if one person died on March 29, 2014, I want to extract the variables for May 2013 to March 2014.) so I can ultimately identify whether the individual beneficiary was dual-eligible for the last year before their death? Or is there a way to read the information for the 12 months of variables based on the death dates and directly generate an indicator without transforming the dataset?

Thank you so much for your help! and I hope my explanation is clear!

TomKari · Posted 08-26-2017 01:30 PM

There are a few pieces of information missing from your post, but I'm going to put up some code that may help to clear up some of your problems.

I think your worst problem is that your dataset format is very challenging to use for analysis purposes. I don't know what your source data looks like, but I'm going to suggest that you rearrange it into two tables.

The first is associated with the date of death, so there's only one record per beneficiary. Using your data, here's a data step that creates it. You would need to figure out how to rearrange your source data to look like this.

data Beneficiaries;
	length ID $6;
	informat Death_dt date9.;
	format Death_dt date9.;
	input ID Death_dt;
	cards;
ABCDEa 29Mar2014
ABCDEb  10SEP2013
run;

And then you have some status information, with one piece of information for each beneficiary for each month. Again, I've set up a data step, but you'd need to figure out how to rearrange your source data to look like this. Note that I've added some stuff for the missing months.

data DualStatuses;
	length ID $6;
	informat StatusDate date9.;
	format StatusDate date9.;
	input ID StatusDate Status;
	cards;
ABCDEa 01Jan2012 2
ABCDEa 01Feb2012 2
ABCDEa 01Mar2012 2
ABCDEa 01Apr2012 2
ABCDEa 01May2012 2
ABCDEa 01Jun2012 2
ABCDEa 01Jul2012 2
ABCDEa 01Aug2012 2
ABCDEa 01Sep2012 2
ABCDEa 01Oct2012 2
ABCDEa 01Nov2012 2
ABCDEa 01Dec2012 2
ABCDEa 01Jan2013 2
ABCDEa 01Feb2013 2
ABCDEa 01Mar2013 2
ABCDEa 01Apr2013 2
ABCDEa 01May2013 2
ABCDEa 01Jun2013 2
ABCDEa 01Jul2013 2
ABCDEa 01Aug2013 .
ABCDEa 01Sep2013 2
ABCDEa 01Oct2013 2
ABCDEa 01Nov2013 2
ABCDEa 01Dec2013 2
ABCDEa 01Jan2014 2
ABCDEa 01Feb2014 0
ABCDEa 01Mar2014 0
ABCDEa 01Apr2014 0
ABCDEa 01May2014 0
ABCDEa 01Jun2014 0
ABCDEa 01Jul2014 0
ABCDEa 01Aug2014 0
ABCDEa 01Sep2014 0
ABCDEa 01Oct2014 0
ABCDEa 01Nov2014 0
ABCDEa 01Dec2014 .
ABCDEb 01Jan2012 3
ABCDEb 01Feb2012 3
ABCDEb 01Mar2012 6
ABCDEb 01Apr2012 6
ABCDEb 01May2012 6
ABCDEb 01Jun2012 6
ABCDEb 01Jul2012 6
ABCDEb 01Aug2012 6
ABCDEb 01Sep2012 6
ABCDEb 01Oct2012 6
ABCDEb 01Nov2012 6
ABCDEb 01Dec2012 6
ABCDEb 01Jan2013 6
ABCDEb 01Feb2013 6
ABCDEb 01Mar2013 6
ABCDEb 01Apr2013 6
ABCDEb 01May2013 6
ABCDEb 01Jun2013 6
ABCDEb 01Jul2013 6
ABCDEb 01Aug2013 6
ABCDEb 01Sep2013 6
ABCDEb 01Oct2013 6
ABCDEb 01Nov2013 6
ABCDEb 01Dec2013 0
run;

So now you've got two tables structured with information associated with beneficiaries, and with statuses. SQL can be used to combine them so that you've got all the relevant information in one row per beneficiary/month:

proc sql noprint;
	create table combined as
		select b.*, s.StatusDate, s.Status
			from Beneficiaries b inner join DualStatuses s
				on(b.ID = s.ID);
quit;

And then you can use a DATA step to figure out your date thresholds. The detailed requirements were missing, but you can just modify the INTNX function and the comparison to death date as required:

data Marked;
	set Combined;
	format CutoffDate date9.;
	CutoffDate = intnx('month', Death_dt, -12, 'beg');

	if StatusDate >= CutoffDate & StatusDate <= Death_dt then
		InWindow = 1;
	else InWindow = 0;
run;

This should result in a variable called "InWindow" that for each record states whether or not the record is in the twelve-month period preceding death.

Hope this helps,

Tom

View solution in original post

TomKari · Posted 08-26-2017 01:30 PM

There are a few pieces of information missing from your post, but I'm going to put up some code that may help to clear up some of your problems.

I think your worst problem is that your dataset format is very challenging to use for analysis purposes. I don't know what your source data looks like, but I'm going to suggest that you rearrange it into two tables.

The first is associated with the date of death, so there's only one record per beneficiary. Using your data, here's a data step that creates it. You would need to figure out how to rearrange your source data to look like this.

data Beneficiaries;
	length ID $6;
	informat Death_dt date9.;
	format Death_dt date9.;
	input ID Death_dt;
	cards;
ABCDEa 29Mar2014
ABCDEb  10SEP2013
run;

And then you have some status information, with one piece of information for each beneficiary for each month. Again, I've set up a data step, but you'd need to figure out how to rearrange your source data to look like this. Note that I've added some stuff for the missing months.

data DualStatuses;
	length ID $6;
	informat StatusDate date9.;
	format StatusDate date9.;
	input ID StatusDate Status;
	cards;
ABCDEa 01Jan2012 2
ABCDEa 01Feb2012 2
ABCDEa 01Mar2012 2
ABCDEa 01Apr2012 2
ABCDEa 01May2012 2
ABCDEa 01Jun2012 2
ABCDEa 01Jul2012 2
ABCDEa 01Aug2012 2
ABCDEa 01Sep2012 2
ABCDEa 01Oct2012 2
ABCDEa 01Nov2012 2
ABCDEa 01Dec2012 2
ABCDEa 01Jan2013 2
ABCDEa 01Feb2013 2
ABCDEa 01Mar2013 2
ABCDEa 01Apr2013 2
ABCDEa 01May2013 2
ABCDEa 01Jun2013 2
ABCDEa 01Jul2013 2
ABCDEa 01Aug2013 .
ABCDEa 01Sep2013 2
ABCDEa 01Oct2013 2
ABCDEa 01Nov2013 2
ABCDEa 01Dec2013 2
ABCDEa 01Jan2014 2
ABCDEa 01Feb2014 0
ABCDEa 01Mar2014 0
ABCDEa 01Apr2014 0
ABCDEa 01May2014 0
ABCDEa 01Jun2014 0
ABCDEa 01Jul2014 0
ABCDEa 01Aug2014 0
ABCDEa 01Sep2014 0
ABCDEa 01Oct2014 0
ABCDEa 01Nov2014 0
ABCDEa 01Dec2014 .
ABCDEb 01Jan2012 3
ABCDEb 01Feb2012 3
ABCDEb 01Mar2012 6
ABCDEb 01Apr2012 6
ABCDEb 01May2012 6
ABCDEb 01Jun2012 6
ABCDEb 01Jul2012 6
ABCDEb 01Aug2012 6
ABCDEb 01Sep2012 6
ABCDEb 01Oct2012 6
ABCDEb 01Nov2012 6
ABCDEb 01Dec2012 6
ABCDEb 01Jan2013 6
ABCDEb 01Feb2013 6
ABCDEb 01Mar2013 6
ABCDEb 01Apr2013 6
ABCDEb 01May2013 6
ABCDEb 01Jun2013 6
ABCDEb 01Jul2013 6
ABCDEb 01Aug2013 6
ABCDEb 01Sep2013 6
ABCDEb 01Oct2013 6
ABCDEb 01Nov2013 6
ABCDEb 01Dec2013 0
run;

So now you've got two tables structured with information associated with beneficiaries, and with statuses. SQL can be used to combine them so that you've got all the relevant information in one row per beneficiary/month:

proc sql noprint;
	create table combined as
		select b.*, s.StatusDate, s.Status
			from Beneficiaries b inner join DualStatuses s
				on(b.ID = s.ID);
quit;

And then you can use a DATA step to figure out your date thresholds. The detailed requirements were missing, but you can just modify the INTNX function and the comparison to death date as required:

data Marked;
	set Combined;
	format CutoffDate date9.;
	CutoffDate = intnx('month', Death_dt, -12, 'beg');

	if StatusDate >= CutoffDate & StatusDate <= Death_dt then
		InWindow = 1;
	else InWindow = 0;
run;

This should result in a variable called "InWindow" that for each record states whether or not the record is in the twelve-month period preceding death.

Hope this helps,

Tom

hkim30 · Posted 08-27-2017 07:12 PM

Thanks TomKari --

Thank you so much for your time and efforts. I really appreciate your help although the information might have been limited to make suggestions. I transposed my dataset from wide to long, created a new dataset that has 12-13 months of data preceding deaths, changed it into wide dataset again, and used array to create a flag variable.

I don't need to add this long codes of my attempt, but I am adding them just in case this might be somehow helpful for others. OR someone might want to suggest more efficient ways to do the same tasks..

Thanks again!

*******************************************************************************************************************************************

* wide to long ;

proc transpose data=work.mbsf_partd1214_dual_v2 out=long_cstshr prefix=cst_shr_grp_cd;
by bene_id BENE_ENROLLMT_REF_YR;
var cst_shr_grp_cd_01-cst_shr_grp_cd_12;
run;

proc transpose data=work.mbsf_partd1214_dual_v2 out=long_dualstus prefix=dual_stus_cd;
by bene_id BENE_ENROLLMT_REF_YR;
var dual_stus_cd_01-dual_stus_cd_12;
run;

data long1;
merge long_cstshr (rename=(cst_shr_grp_cd1=cst_shr_grp_cd) drop=_name_) long_dualstus (rename=(dual_stus_cd1=dual_stus_cd));
by bene_id BENE_ENROLLMT_REF_YR;
month=input(substr(_name_,14), 5.);
drop _name_ _label_;
day=1;
run;

/* Use the MDY function along with variables representing the month, day,
and year values to create a SAS date. A format of MMDDYY10. is applied
to the date. */;

data long2;
set long1;
statusdate=mdy(month, day, bene_enrollmt_ref_yr);
format statusdate date9.;
run;

* merging the dataset (a) cohort_dod and (b) long2;
proc sql;
create table combined as
select a.*, b.statusdate, b.cst_shr_grp_cd, b.dual_stus_cd
from cohort_dod a inner join long2 b
on(a.bene_id =b.bene_id);
quit;

* select 12 months before death dates ;

data combined2;
set combined;
format cutoffdate date9.;
cutoffdate = intnx('month', bene_death_dt, -12, 'beg');
if statusdate >= cutoffdate & statusdate <=bene_death_dt then
inwindow = 1; else inwindow=0;
run;

* only keep where inwindow=1;

data combined3 (drop=inwindow);
set combined2;
if inwindow=1;
run;

* add count variable to transpose ;
data out.dualeligible ; length month 3; set combined3;
by bene_id;
if first.bene_id then month=1;
else month+1; run;

* long to wide;

proc transpose data=out.dualeligible out=wide_cst (drop=_name_) prefix=cost_shr;
by bene_id;
id month;
var cst_shr_grp_cd ;
run;

proc transpose data=out.dualeligible out=wide_dual (drop=_name_) prefix=dual_stus;
by bene_id;
id month;
var dual_stus_cd ;
run;

data in.dualeligible_wide ;
merge wide_cst wide_dual;
by bene_id;
run;

*****************************************************;
* create a "dual eligibles" flag using array (not included)
*****************************************************; ...

PGStats · Posted 08-26-2017 05:58 PM

Start with what you have, but move to a long data file organisation. Here is a simple way to do it:

data have;
input id $ death_dt :date7. ref_yr Dual_01 - Dual_12;
datalines;
ABCDEa 29mar14 2012 2 2 2 2 2 2 2 2 2 2 2 2
ABCDEa 29mar14 2013 2 2 2 2 2 2 2 2 2 2 2 2
ABCDEa 29mar14 2014 2 0 NA NA NA NA NA NA NA NA NA NA
ABCDEb 10sep13 2012 3 3 6 6 6 6 6 6 6 6 6 6
ABCDEb 10sep13 2013 6 6 6 6 6 6 6 6 NA NA NA NA
;

data want;
set have;
array d dual_01-dual_12;
do month = 1 to 12;
    dual = d{month};
    month_dt = mdy(month, 1, ref_yr);
    monthsBeforeDeath = intck("month", month_dt, death_dt);
    if 0 < monthsBeforeDeath <= 12 then output;
    end;
keep id death_dt month_dt monthsBeforeDeath dual;
format death_dt yymmdd10. month_dt yymm7.; 
run; 

proc sort data=want; by id descending monthsBeforeDeath; run;

PG

hkim30 · Posted 08-27-2017 07:20 PM

Thank you so much, PGStats, for your response despite the possibly limited information in my prior inquiry! As suggested, I changed my previous data from wide to long and worked from there. I used a bit different methods but it seems like your method is more efficient and reduces many more steps I had to go through. I am replicating your codes now but I am facing slight difficulties because the real dataset has other variables as well. I will figure it out eventually..!! Thanks a lot again. Anyways just one more clarifying question, so it seems like I can use "array" function to change the dataset from wide to long form?

PGStats · Posted 08-27-2017 10:51 PM

Yes, arrays are often useful tools for transforming long to wide and vice-versa. My code above does involve an array statement (there is no array function in SAS).

PG

How do I read or extract variables of specified time period to create a new indicator?

Re: How do I read or extract variables of specified time period to create a new indicator?

Re: How do I read or extract variables of specified time period to create a new indicator?

Re: How do I read or extract variables of specified time period to create a new indicator?

Re: How do I read or extract variables of specified time period to create a new indicator?

Re: How do I read or extract variables of specified time period to create a new indicator?

Re: How do I read or extract variables of specified time period to create a new indicator?

How do I read or extract variables of specified time period to create a new indicator?

Re: How do I read or extract variables of specified time period to create a new indicator?

Re: How do I read or extract variables of specified time period to create a new indicator?

Re: How do I read or extract variables of specified time period to create a new indicator?

Re: How do I read or extract variables of specified time period to create a new indicator?

Re: How do I read or extract variables of specified time period to create a new indicator?

Re: How do I read or extract variables of specified time period to create a new indicator?

Ready to join fellow brilliant minds for the SAS Hackathon?