SAS Data Integration Studio, DataFlux Data Management Studio, SAS/ACCESS, SAS Data Loader for Hadoop and others

How do I read or extract variables of specified time period to create a new indicator?

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 5
Accepted Solution

How do I read or extract variables of specified time period to create a new indicator?

Hello, I have this question I am struggling with for weeks, and I would appreciate so much if I could get your advice and help on this to move forward. Thanks so much for your time and attention in advance!  


A bit of context
I am working on the Medicare Claims data for the calendar year 2012-2014. There is a file called Medicare Beneficiary Summary File every year, which is at an individual level and contains basic information about the individual beneficiary, such as sex, age, date of birth and death, Medicare Part A and B enrollment status, etc. My project sample is the group of Medicare benes who died in the period of 2013 and 2014.


Task
My task is to create a “dual eligibility status” indicator that shows their dual eligibility status for the 12 months prior to their deaths.There are twelve monthly dual status code variables per individual and if the value is a certain value, I consider them dual-eligibles. Now the dataset I created looks like this: 

      id             death_dt       ref_yr   Dual_01   Dual_02   Dual_03 … Dual_12
ABCDEa     29MAR2014    2012        02            02            02                 02
ABCDEa     29MAR2014    2013        02            02            02                 02
ABCDEa     29MAR2014    2014        02            00            00                 NA
ABCDEb    10SEP2013      2012        03            03            06                 06
ABCDEb    10SEP2013      2013        06            06            06                 00

 

Stuggles and question

For now, as shown above, I have several rows per person depending on their date of death. However, I plan to use only 12 months of data prior to the bene's death year and month to create an indicator of dual eligible status. How can I create a dataset that has only one row per person, id, death date, reference year, and 12 months prior to their death year and month? (e.g., if one person died on March 29, 2014, I want to extract the variables for May 2013 to March 2014.) so I can ultimately identify whether the individual  beneficiary was dual-eligible for the last year before their death?  Or is there a way to read the information for the 12 months of variables based on the death dates and directly generate an indicator without transforming the dataset?

 

 

Thank you so much for your help! and I hope my explanation is clear!  


Accepted Solutions
Solution
‎08-27-2017 07:00 PM
PROC Star
Posts: 1,288

Re: How do I read or extract variables of specified time period to create a new indicator?

There are a few pieces of information missing from your post, but I'm going to put up some code that may help to clear up some of your problems.

 

I think your worst problem is that your dataset format is very challenging to use for analysis purposes. I don't know what your source data looks like, but I'm going to suggest that you rearrange it into two tables.

 

The first is associated with the date of death, so there's only one record per beneficiary. Using your data, here's a data step that creates it. You would need to figure out how to rearrange your source data to look like this.

 

data Beneficiaries;
	length ID $6;
	informat Death_dt date9.;
	format Death_dt date9.;
	input ID Death_dt;
	cards;
ABCDEa 29Mar2014
ABCDEb  10SEP2013
run;

And then you have some status information, with one piece of information for each beneficiary for each month. Again, I've set up a data step, but you'd need to figure out how to rearrange your source data to look like this. Note that I've added some stuff for the missing months.

 

data DualStatuses;
	length ID $6;
	informat StatusDate date9.;
	format StatusDate date9.;
	input ID StatusDate Status;
	cards;
ABCDEa 01Jan2012 2
ABCDEa 01Feb2012 2
ABCDEa 01Mar2012 2
ABCDEa 01Apr2012 2
ABCDEa 01May2012 2
ABCDEa 01Jun2012 2
ABCDEa 01Jul2012 2
ABCDEa 01Aug2012 2
ABCDEa 01Sep2012 2
ABCDEa 01Oct2012 2
ABCDEa 01Nov2012 2
ABCDEa 01Dec2012 2
ABCDEa 01Jan2013 2
ABCDEa 01Feb2013 2
ABCDEa 01Mar2013 2
ABCDEa 01Apr2013 2
ABCDEa 01May2013 2
ABCDEa 01Jun2013 2
ABCDEa 01Jul2013 2
ABCDEa 01Aug2013 .
ABCDEa 01Sep2013 2
ABCDEa 01Oct2013 2
ABCDEa 01Nov2013 2
ABCDEa 01Dec2013 2
ABCDEa 01Jan2014 2
ABCDEa 01Feb2014 0
ABCDEa 01Mar2014 0
ABCDEa 01Apr2014 0
ABCDEa 01May2014 0
ABCDEa 01Jun2014 0
ABCDEa 01Jul2014 0
ABCDEa 01Aug2014 0
ABCDEa 01Sep2014 0
ABCDEa 01Oct2014 0
ABCDEa 01Nov2014 0
ABCDEa 01Dec2014 .
ABCDEb 01Jan2012 3
ABCDEb 01Feb2012 3
ABCDEb 01Mar2012 6
ABCDEb 01Apr2012 6
ABCDEb 01May2012 6
ABCDEb 01Jun2012 6
ABCDEb 01Jul2012 6
ABCDEb 01Aug2012 6
ABCDEb 01Sep2012 6
ABCDEb 01Oct2012 6
ABCDEb 01Nov2012 6
ABCDEb 01Dec2012 6
ABCDEb 01Jan2013 6
ABCDEb 01Feb2013 6
ABCDEb 01Mar2013 6
ABCDEb 01Apr2013 6
ABCDEb 01May2013 6
ABCDEb 01Jun2013 6
ABCDEb 01Jul2013 6
ABCDEb 01Aug2013 6
ABCDEb 01Sep2013 6
ABCDEb 01Oct2013 6
ABCDEb 01Nov2013 6
ABCDEb 01Dec2013 0
run;

So now you've got two tables structured with information associated with beneficiaries, and with statuses. SQL can be used to combine them so that you've got all the relevant information in one row per beneficiary/month:

 

 

proc sql noprint;
	create table combined as
		select b.*, s.StatusDate, s.Status
			from Beneficiaries b inner join DualStatuses s
				on(b.ID = s.ID);
quit;

And then you can use a DATA step to figure out your date thresholds. The detailed requirements were missing, but you can just modify the INTNX function and the comparison to death date as required:

 

 

data Marked;
	set Combined;
	format CutoffDate date9.;
	CutoffDate = intnx('month', Death_dt, -12, 'beg');

	if StatusDate >= CutoffDate & StatusDate <= Death_dt then
		InWindow = 1;
	else InWindow = 0;
run;

This should result in a variable called "InWindow" that for each record states whether or not the record is in the twelve-month period preceding death.

 

 

Hope this helps,

   Tom

View solution in original post


All Replies
Solution
‎08-27-2017 07:00 PM
PROC Star
Posts: 1,288

Re: How do I read or extract variables of specified time period to create a new indicator?

There are a few pieces of information missing from your post, but I'm going to put up some code that may help to clear up some of your problems.

 

I think your worst problem is that your dataset format is very challenging to use for analysis purposes. I don't know what your source data looks like, but I'm going to suggest that you rearrange it into two tables.

 

The first is associated with the date of death, so there's only one record per beneficiary. Using your data, here's a data step that creates it. You would need to figure out how to rearrange your source data to look like this.

 

data Beneficiaries;
	length ID $6;
	informat Death_dt date9.;
	format Death_dt date9.;
	input ID Death_dt;
	cards;
ABCDEa 29Mar2014
ABCDEb  10SEP2013
run;

And then you have some status information, with one piece of information for each beneficiary for each month. Again, I've set up a data step, but you'd need to figure out how to rearrange your source data to look like this. Note that I've added some stuff for the missing months.

 

data DualStatuses;
	length ID $6;
	informat StatusDate date9.;
	format StatusDate date9.;
	input ID StatusDate Status;
	cards;
ABCDEa 01Jan2012 2
ABCDEa 01Feb2012 2
ABCDEa 01Mar2012 2
ABCDEa 01Apr2012 2
ABCDEa 01May2012 2
ABCDEa 01Jun2012 2
ABCDEa 01Jul2012 2
ABCDEa 01Aug2012 2
ABCDEa 01Sep2012 2
ABCDEa 01Oct2012 2
ABCDEa 01Nov2012 2
ABCDEa 01Dec2012 2
ABCDEa 01Jan2013 2
ABCDEa 01Feb2013 2
ABCDEa 01Mar2013 2
ABCDEa 01Apr2013 2
ABCDEa 01May2013 2
ABCDEa 01Jun2013 2
ABCDEa 01Jul2013 2
ABCDEa 01Aug2013 .
ABCDEa 01Sep2013 2
ABCDEa 01Oct2013 2
ABCDEa 01Nov2013 2
ABCDEa 01Dec2013 2
ABCDEa 01Jan2014 2
ABCDEa 01Feb2014 0
ABCDEa 01Mar2014 0
ABCDEa 01Apr2014 0
ABCDEa 01May2014 0
ABCDEa 01Jun2014 0
ABCDEa 01Jul2014 0
ABCDEa 01Aug2014 0
ABCDEa 01Sep2014 0
ABCDEa 01Oct2014 0
ABCDEa 01Nov2014 0
ABCDEa 01Dec2014 .
ABCDEb 01Jan2012 3
ABCDEb 01Feb2012 3
ABCDEb 01Mar2012 6
ABCDEb 01Apr2012 6
ABCDEb 01May2012 6
ABCDEb 01Jun2012 6
ABCDEb 01Jul2012 6
ABCDEb 01Aug2012 6
ABCDEb 01Sep2012 6
ABCDEb 01Oct2012 6
ABCDEb 01Nov2012 6
ABCDEb 01Dec2012 6
ABCDEb 01Jan2013 6
ABCDEb 01Feb2013 6
ABCDEb 01Mar2013 6
ABCDEb 01Apr2013 6
ABCDEb 01May2013 6
ABCDEb 01Jun2013 6
ABCDEb 01Jul2013 6
ABCDEb 01Aug2013 6
ABCDEb 01Sep2013 6
ABCDEb 01Oct2013 6
ABCDEb 01Nov2013 6
ABCDEb 01Dec2013 0
run;

So now you've got two tables structured with information associated with beneficiaries, and with statuses. SQL can be used to combine them so that you've got all the relevant information in one row per beneficiary/month:

 

 

proc sql noprint;
	create table combined as
		select b.*, s.StatusDate, s.Status
			from Beneficiaries b inner join DualStatuses s
				on(b.ID = s.ID);
quit;

And then you can use a DATA step to figure out your date thresholds. The detailed requirements were missing, but you can just modify the INTNX function and the comparison to death date as required:

 

 

data Marked;
	set Combined;
	format CutoffDate date9.;
	CutoffDate = intnx('month', Death_dt, -12, 'beg');

	if StatusDate >= CutoffDate & StatusDate <= Death_dt then
		InWindow = 1;
	else InWindow = 0;
run;

This should result in a variable called "InWindow" that for each record states whether or not the record is in the twelve-month period preceding death.

 

 

Hope this helps,

   Tom

Occasional Contributor
Posts: 5

Re: How do I read or extract variables of specified time period to create a new indicator?

[ Edited ]

Thanks TomKari --

Thank you so much for your time and efforts. I really appreciate your help although the information might have been limited to make suggestions. I transposed my dataset from wide to long, created a new dataset that has 12-13 months of data preceding deaths, changed it into wide dataset again, and used array to create a flag variable.  

 

I don't need to add this long codes of my attempt, but I am adding them just in case this might be somehow helpful for others. OR someone might want to suggest more efficient ways to do the same tasks.. 

 

Thanks again! 

*******************************************************************************************************************************************

* wide to long ;

proc transpose data=work.mbsf_partd1214_dual_v2 out=long_cstshr prefix=cst_shr_grp_cd;
by bene_id BENE_ENROLLMT_REF_YR;
var cst_shr_grp_cd_01-cst_shr_grp_cd_12;
run;

 

proc transpose data=work.mbsf_partd1214_dual_v2 out=long_dualstus prefix=dual_stus_cd;
by bene_id BENE_ENROLLMT_REF_YR;
var dual_stus_cd_01-dual_stus_cd_12;
run;

 

data long1;
merge long_cstshr (rename=(cst_shr_grp_cd1=cst_shr_grp_cd) drop=_name_) long_dualstus (rename=(dual_stus_cd1=dual_stus_cd));
by bene_id BENE_ENROLLMT_REF_YR;
month=input(substr(_name_,14), 5.);
drop _name_ _label_;
day=1;
run;


/* Use the MDY function along with variables representing the month, day, 
 and year values to create a SAS date. A format of MMDDYY10. is applied
to the date. */;

 

data long2;
set long1;
statusdate=mdy(month, day, bene_enrollmt_ref_yr);
format statusdate date9.;
run;

 

* merging the dataset (a) cohort_dod and (b) long2;
proc sql;
create table combined as
select a.*, b.statusdate, b.cst_shr_grp_cd, b.dual_stus_cd
from cohort_dod a inner join long2 b
on(a.bene_id =b.bene_id);
quit;


* select 12 months before death dates ;

data combined2;
set combined;
format cutoffdate date9.;
cutoffdate = intnx('month', bene_death_dt, -12, 'beg');
if statusdate >= cutoffdate & statusdate <=bene_death_dt then
inwindow = 1; else inwindow=0;
run;

* only keep where inwindow=1;

data combined3 (drop=inwindow);
set combined2;
if inwindow=1;
run;

 

* add count variable to transpose ;
data out.dualeligible ; length month 3; set combined3;
by bene_id;
if first.bene_id then month=1;
else month+1; run;

 

* long to wide;

proc transpose data=out.dualeligible out=wide_cst (drop=_name_) prefix=cost_shr;
by bene_id;
id month;
var cst_shr_grp_cd ;
run;

proc transpose data=out.dualeligible out=wide_dual (drop=_name_) prefix=dual_stus;
by bene_id;
id month;
var dual_stus_cd ;
run;

data in.dualeligible_wide ;
merge wide_cst wide_dual;
by bene_id;
run;


*****************************************************;
* create a "dual eligibles" flag using array  (not included) 
*****************************************************; ...

Esteemed Advisor
Posts: 5,487

Re: How do I read or extract variables of specified time period to create a new indicator?

Start with what you have, but move to a long data file organisation. Here is a simple way to do it:

 

data have;
input id $ death_dt :date7. ref_yr Dual_01 - Dual_12;
datalines;
ABCDEa 29mar14 2012 2 2 2 2 2 2 2 2 2 2 2 2
ABCDEa 29mar14 2013 2 2 2 2 2 2 2 2 2 2 2 2
ABCDEa 29mar14 2014 2 0 NA NA NA NA NA NA NA NA NA NA
ABCDEb 10sep13 2012 3 3 6 6 6 6 6 6 6 6 6 6
ABCDEb 10sep13 2013 6 6 6 6 6 6 6 6 NA NA NA NA
;

data want;
set have;
array d dual_01-dual_12;
do month = 1 to 12;
    dual = d{month};
    month_dt = mdy(month, 1, ref_yr);
    monthsBeforeDeath = intck("month", month_dt, death_dt);
    if 0 < monthsBeforeDeath <= 12 then output;
    end;
keep id death_dt month_dt monthsBeforeDeath dual;
format death_dt yymmdd10. month_dt yymm7.; 
run; 

proc sort data=want; by id descending monthsBeforeDeath; run;
PG
Occasional Contributor
Posts: 5

Re: How do I read or extract variables of specified time period to create a new indicator?

[ Edited ]

Thank you so much, PGStats, for your response despite the possibly limited information in my prior inquiry! As suggested, I changed my previous data from wide to long and worked from there. I used a bit different methods but it seems like your method is more efficient and reduces many more steps I had to go through. I am replicating your codes now but I am facing slight difficulties because the real dataset has other variables as well. I will figure it out eventually..!! Thanks a lot again. Anyways just one more clarifying question, so it seems like I can use "array" function to change the dataset from wide to long form?

Esteemed Advisor
Posts: 5,487

Re: How do I read or extract variables of specified time period to create a new indicator?

Yes, arrays are often useful tools for transforming long to wide and vice-versa. My code above does involve an array statement (there is no array function in SAS).

PG
☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 5 replies
  • 442 views
  • 2 likes
  • 3 in conversation