Solved: Re: do statement or transpose?

tmcrouse · Posted 08-06-2013 10:07 AM

I am racking my brain trying to figure out how to get this right. I have tried proc transpose to put the various fst srvc dts in one row thinking that would help. But it did not. Then tried do statements but that has not worked. Any suggestions? I need to look at whenever there is a maj restudy, look at the date of the restudy and then look for a proc done after that restudy but it has to be between fst_srvc_dt+1 and fst_srvc_dt+90. Anotherwords, taking the first example, the restudy was done 4/30/2012. So I begin counting on 5/1/2012 and go no more than 90days out. That means the procedure on 6/4/2012 qualifies and it was a perf. I know how to so a statement to say if all these conditions are met, create a column called sub_table and if the condition is the perf I will call it MAJPERF. If the condition is a cath it will be MAJCATH, pet is MAJPET and ct is MAJCT. I hope I am making sense. I put the do code I tried but that did not work.

Data Have:

MBR_SYS_ID	FST_SRVC_DT	Prov_tin_maj_restudy	PROV_TIN	nat_perf	nat_cath	nat_pet	nat_ct
1111	4/22/2012		593415206
1111	4/30/2012	1	593415206		1
1111	6/4/2012		593733939	1
1111	6/25/2012		593733939
2222	6/22/2012		593415206	1
2222	7/26/2012		593415206
2222	8/28/2012		591726273		1
2222	8/28/2012		593415206
2222	9/4/2012	1	593415206
2222	10/8/2012		593415206		1
3333	7/30/2012	1	593415206		1
3333	10/10/2012		593415206	1
4444	2/26/2012		262157500				1
4444	2/27/2012		593415206	1
4444	2/28/2012	1	593415206		1
4444	5/18/2012		590724459		1
4444	5/18/2012		593415206

Data Need:

MBR_SYS_ID	FST_SRVC_DT	Prov_tin_maj_restudy	PROV_TIN	nat_perf	nat_cath	nat_pet	nat_ct	sub_table
1111	6/4/2012		593733939	1				MAJPERF
2222	10/8/2012		593415206		1			MAJCATH
3333	10/10/2012		593415206	1				MAJPERF
4444	5/18/2012		590724459		1			MAJCATH

data maj_restudyA;

set maj_restudy nobs=last_obs;

by mbr_sys_id;

if prov_tin_pci=1 and prov_tin_maj_restudy = 1 and fst_srvc_dt<=&enddt3-90 then do;

wb = fst_srvc_dt + 1;

we = fst_srvc_dt + 90;

do pt = _N_ + 1 to last_obs;

set cardiac.maj_restudy(keep= mbr_sys_id fst_srvc_dt nat_cath nat_ct nat_perf nat_pet

rename=(mbr_sys_id=mem fst_srvc_dt=dt))

point=pt;

if dt >= wb and dt<=we and nat_cath = 1 then 1 end as 'MAJCATH';

else if dt>=wb and dt<=we and nat_ct = 1 then 1 end as 'MAJCT';

else if dt>=wb and dt<=we and nat_perf = 1 then 1 end as 'MAJPERF';

else if dt>=wb and dt<=we and nat_pet = 1 then 1 end as 'MAJPET';

end;

output;

end;

if we =. then we=fst_srvc_dt;

if wb=' ' then wb=fst_srvc_dt;

if we=wb then output;

drop wb we mem dt pt;

run;

Vince28_Statcan · Posted 08-06-2013 10:49 AM

data want;

if _N_=1 then do;

declare hash h(dataset: "maj_restudy(where=(prov_tin_maj_restudy=1) rename=(fst_srvc_dt=major_study_start_date))");

h.defineKey('mbr_sys_id');

h.defineData(major_study_start_date);

h.defineDone();

end;

set maj_restudy;

rc=h.find();

if rc=0 then do;

if (1 LE fst_srvc_dt - major_study_start_date LE 90) and (nat_perf=1 or nat_cath=1 or nat_pet=1 or nat_ct=1) then do

if nat_perf=1 then sub_table="MAJPERF";

else if nat_cath=1 then subtable="MAJCATH";

else if nat_pet=1 then subtable="MAJPET";

else if nat_pet=1 then subtable="MAJCT";

output;

end;

drop rc major_study_start_date;

run;

This is an alternative based on your data example. However, if there are multiple Major Studies in any given MBR_SYS_ID, it won't work as is since the hash table defined this way only supports 1 occurence of the key. You may need to play with formats a little, I assumed your dates were naturally numeric values with a date format and not strings as well as I assumed that your indicator columns were numeric.

Basically, it uses the lookup power of hash tables to obtain the start date of the unique major study for each MBR_SYS_ID (if there is one) and puts it in the data vector to allow computation. This would give you multiple rows for a given MBR_SYS_ID if there had been more than one of the 4 columns with 1 for the same MBR_SYS_ID within the following 90 days.

Vince

P.S. When you say "but it did not work". It is best to either give an example of the bad output or a copy of your error log for us to help. For instance, I can't tell from your post if it was a logic or syntax error and reading through someone elses' thoughts/code looking for both is harder. Especially when I don't know your data structure variable types and formats.

A quick glance tells me that

if we =. then we=fst_srvc_dt;

if wb=' ' then wb=fst_srvc_dt;

if we=wb then output;

is basically outputing every single row that didn't have a major study in addition to what should've achieved your desired result in the big outer do;end; block

View solution in original post

Vince28_Statcan · Posted 08-06-2013 10:49 AM

data want;

if _N_=1 then do;

declare hash h(dataset: "maj_restudy(where=(prov_tin_maj_restudy=1) rename=(fst_srvc_dt=major_study_start_date))");

h.defineKey('mbr_sys_id');

h.defineData(major_study_start_date);

h.defineDone();

end;

set maj_restudy;

rc=h.find();

if rc=0 then do;

if (1 LE fst_srvc_dt - major_study_start_date LE 90) and (nat_perf=1 or nat_cath=1 or nat_pet=1 or nat_ct=1) then do

if nat_perf=1 then sub_table="MAJPERF";

else if nat_cath=1 then subtable="MAJCATH";

else if nat_pet=1 then subtable="MAJPET";

else if nat_pet=1 then subtable="MAJCT";

output;

end;

drop rc major_study_start_date;

run;

This is an alternative based on your data example. However, if there are multiple Major Studies in any given MBR_SYS_ID, it won't work as is since the hash table defined this way only supports 1 occurence of the key. You may need to play with formats a little, I assumed your dates were naturally numeric values with a date format and not strings as well as I assumed that your indicator columns were numeric.

Basically, it uses the lookup power of hash tables to obtain the start date of the unique major study for each MBR_SYS_ID (if there is one) and puts it in the data vector to allow computation. This would give you multiple rows for a given MBR_SYS_ID if there had been more than one of the 4 columns with 1 for the same MBR_SYS_ID within the following 90 days.

Vince

P.S. When you say "but it did not work". It is best to either give an example of the bad output or a copy of your error log for us to help. For instance, I can't tell from your post if it was a logic or syntax error and reading through someone elses' thoughts/code looking for both is harder. Especially when I don't know your data structure variable types and formats.

A quick glance tells me that

if we =. then we=fst_srvc_dt;

if wb=' ' then wb=fst_srvc_dt;

if we=wb then output;

is basically outputing every single row that didn't have a major study in addition to what should've achieved your desired result in the big outer do;end; block

tmcrouse · Posted 08-06-2013 11:32 AM

I don't believe there is more than 1 occurence per member for the restudies. I tried this but it gave me this error:

It looks like it does not understand the h.definedone:

455 data maj_restudy2;

456 if _N_=1 then do;

457 declare hash h(datasets: "maj_restudy(where=(prov_tin_maj_restudy=1)

457! rename=(fst_srvc_dt=major_study_start_date))");

ERROR: Unknown argument tag datasets.

458 h.defineKey('mbr_sys_id');

459 h.defineData(major_study_start_date);

460 h.defineDone;

-------------

559 22

76

ERROR 559-185: Invalid object attribute reference h.defineDone.

ERROR 22-322: Syntax error, expecting one of the following: (, +, =.

ERROR 76-322: Syntax error, statement will be ignored.

461 end;

462 set maj_restudy;

463 rc=h.find();

ERROR: DATA STEP Component Object failure. Aborted during the COMPILATION phase.

NOTE: The SAS System stopped processing this step because of errors.

NOTE: DATA statement used (Total process time):

real time 0.00 seconds

cpu time 0.01 seconds

464 if rc=0 then do;

465 if (1 LE fst_srvc_dt - major_study_start_date LE 90) and (nat_perf=1 or nat_cath=1 or

465! nat_pet=1 or nat_ct=1) then do

466 if nat_perf=1 then sub_table="MAJPERF";

467 else if nat_cath=1 then sub_table="MAJCATH";

468 else if nat_pet=1 then sub_table="MAJPET";

469 else if nat_ct=1 then sub_table="MAJCT";

470 output;

471 end;

472 end;

473 drop rc major_study_start_date;

474 run;

Vince28_Statcan · Posted 08-06-2013 12:09 PM

my bad, it's <

(dataset:

and not

(datasets:

updating original post.

tmcrouse · Posted 08-06-2013 12:26 PM

It still does not like that h.definedone; statement

Vince28_Statcan · Posted 08-06-2013 12:42 PM

Sigh, sorry, updated my reply again.

defineDone is a method of the hash table and SAS requires () syntax at the end

h.defineDone();

I always omit it by mistake when I don't have data to test my syntax.

Edit:

If this can be of any additionnal diagnosis of your original post code. Since you are not keeping a new variable for your MBR_SYS_ID each time you find a major study case, your do pt=_N_+1 ... end; statement loops on all studies content that is sorted below your current mbr_sys_id so if some studies overlap, each major study will also cause other studies' procedures that occured within the appropriate timeframe to be output including even the record for a new study if it has an associated procedure applied that day.

tmcrouse · Posted 08-06-2013 01:00 PM

Awesome. That worked like a charm

Haikuo · Posted 08-06-2013 02:57 PM

Besides Hash() approach, Proc SQL is also handy for your problem:

proc sql;

select distinct a.* from have a ,

(select MBR_SYS_ID, FST_SRVC_DT from have where Prov_tin_maj_restudy=1) b

where a.MBR_SYS_ID=b.MBR_SYS_ID

and b.FST_SRVC_DT < a.FST_SRVC_DT < b.FST_SRVC_DT+90

and sum( nat_perf, nat_cath, nat_pet, nat_ct) >=1;

quit;

Haikuo

Update: Sub-query is not needed since the condition is straightforward.

proc sql;

select distinct a.* from have a ,

have(where= (Prov_tin_maj_restudy=1))b

where a.MBR_SYS_ID=b.MBR_SYS_ID

and b.FST_SRVC_DT < a.FST_SRVC_DT < b.FST_SRVC_DT+90

and sum( a.nat_perf, a.nat_cath, a.nat_pet, a.nat_ct) >=1;

quit;

SAS Innovate 2025: Save the Date

SAS Training: Just a Click Away