I am racking my brain trying to figure out how to get this right. I have tried proc transpose to put the various fst srvc dts in one row thinking that would help. But it did not. Then tried do statements but that has not worked. Any suggestions? I need to look at whenever there is a maj restudy, look at the date of the restudy and then look for a proc done after that restudy but it has to be between fst_srvc_dt+1 and fst_srvc_dt+90. Anotherwords, taking the first example, the restudy was done 4/30/2012. So I begin counting on 5/1/2012 and go no more than 90days out. That means the procedure on 6/4/2012 qualifies and it was a perf. I know how to so a statement to say if all these conditions are met, create a column called sub_table and if the condition is the perf I will call it MAJPERF. If the condition is a cath it will be MAJCATH, pet is MAJPET and ct is MAJCT. I hope I am making sense. I put the do code I tried but that did not work.
Data Have:
MBR_SYS_ID | FST_SRVC_DT | Prov_tin_maj_restudy | PROV_TIN | nat_perf | nat_cath | nat_pet | nat_ct |
1111 | 4/22/2012 | 593415206 | |||||
1111 | 4/30/2012 | 1 | 593415206 | 1 | |||
1111 | 6/4/2012 | 593733939 | 1 | ||||
1111 | 6/25/2012 | 593733939 | |||||
2222 | 6/22/2012 | 593415206 | 1 | ||||
2222 | 7/26/2012 | 593415206 | |||||
2222 | 8/28/2012 | 591726273 | 1 | ||||
2222 | 8/28/2012 | 593415206 | |||||
2222 | 9/4/2012 | 1 | 593415206 | ||||
2222 | 10/8/2012 | 593415206 | 1 | ||||
3333 | 7/30/2012 | 1 | 593415206 | 1 | |||
3333 | 10/10/2012 | 593415206 | 1 | ||||
4444 | 2/26/2012 | 262157500 | 1 | ||||
4444 | 2/27/2012 | 593415206 | 1 | ||||
4444 | 2/28/2012 | 1 | 593415206 | 1 | |||
4444 | 5/18/2012 | 590724459 | 1 | ||||
4444 | 5/18/2012 | 593415206 |
Data Need:
MBR_SYS_ID | FST_SRVC_DT | Prov_tin_maj_restudy | PROV_TIN | nat_perf | nat_cath | nat_pet | nat_ct | sub_table |
1111 | 6/4/2012 | 593733939 | 1 | MAJPERF | ||||
2222 | 10/8/2012 | 593415206 | 1 | MAJCATH | ||||
3333 | 10/10/2012 | 593415206 | 1 | MAJPERF | ||||
4444 | 5/18/2012 | 590724459 | 1 | MAJCATH |
data maj_restudyA;
set maj_restudy nobs=last_obs;
by mbr_sys_id;
if prov_tin_pci=1 and prov_tin_maj_restudy = 1 and fst_srvc_dt<=&enddt3-90 then do;
wb = fst_srvc_dt + 1;
we = fst_srvc_dt + 90;
do pt = _N_ + 1 to last_obs;
set cardiac.maj_restudy(keep= mbr_sys_id fst_srvc_dt nat_cath nat_ct nat_perf nat_pet
rename=(mbr_sys_id=mem fst_srvc_dt=dt))
point=pt;
if dt >= wb and dt<=we and nat_cath = 1 then 1 end as 'MAJCATH';
else if dt>=wb and dt<=we and nat_ct = 1 then 1 end as 'MAJCT';
else if dt>=wb and dt<=we and nat_perf = 1 then 1 end as 'MAJPERF';
else if dt>=wb and dt<=we and nat_pet = 1 then 1 end as 'MAJPET';
end;
output;
end;
if we =. then we=fst_srvc_dt;
if wb=' ' then wb=fst_srvc_dt;
if we=wb then output;
drop wb we mem dt pt;
run;
data want;
if _N_=1 then do;
declare hash h(dataset: "maj_restudy(where=(prov_tin_maj_restudy=1) rename=(fst_srvc_dt=major_study_start_date))");
h.defineKey('mbr_sys_id');
h.defineData(major_study_start_date);
h.defineDone();
end;
set maj_restudy;
rc=h.find();
if rc=0 then do;
if (1 LE fst_srvc_dt - major_study_start_date LE 90) and (nat_perf=1 or nat_cath=1 or nat_pet=1 or nat_ct=1) then do
if nat_perf=1 then sub_table="MAJPERF";
else if nat_cath=1 then subtable="MAJCATH";
else if nat_pet=1 then subtable="MAJPET";
else if nat_pet=1 then subtable="MAJCT";
output;
end;
end;
drop rc major_study_start_date;
run;
This is an alternative based on your data example. However, if there are multiple Major Studies in any given MBR_SYS_ID, it won't work as is since the hash table defined this way only supports 1 occurence of the key. You may need to play with formats a little, I assumed your dates were naturally numeric values with a date format and not strings as well as I assumed that your indicator columns were numeric.
Basically, it uses the lookup power of hash tables to obtain the start date of the unique major study for each MBR_SYS_ID (if there is one) and puts it in the data vector to allow computation. This would give you multiple rows for a given MBR_SYS_ID if there had been more than one of the 4 columns with 1 for the same MBR_SYS_ID within the following 90 days.
Vince
P.S. When you say "but it did not work". It is best to either give an example of the bad output or a copy of your error log for us to help. For instance, I can't tell from your post if it was a logic or syntax error and reading through someone elses' thoughts/code looking for both is harder. Especially when I don't know your data structure variable types and formats.
A quick glance tells me that
if we =. then we=fst_srvc_dt;
if wb=' ' then wb=fst_srvc_dt;
if we=wb then output;
is basically outputing every single row that didn't have a major study in addition to what should've achieved your desired result in the big outer do;end; block
data want;
if _N_=1 then do;
declare hash h(dataset: "maj_restudy(where=(prov_tin_maj_restudy=1) rename=(fst_srvc_dt=major_study_start_date))");
h.defineKey('mbr_sys_id');
h.defineData(major_study_start_date);
h.defineDone();
end;
set maj_restudy;
rc=h.find();
if rc=0 then do;
if (1 LE fst_srvc_dt - major_study_start_date LE 90) and (nat_perf=1 or nat_cath=1 or nat_pet=1 or nat_ct=1) then do
if nat_perf=1 then sub_table="MAJPERF";
else if nat_cath=1 then subtable="MAJCATH";
else if nat_pet=1 then subtable="MAJPET";
else if nat_pet=1 then subtable="MAJCT";
output;
end;
end;
drop rc major_study_start_date;
run;
This is an alternative based on your data example. However, if there are multiple Major Studies in any given MBR_SYS_ID, it won't work as is since the hash table defined this way only supports 1 occurence of the key. You may need to play with formats a little, I assumed your dates were naturally numeric values with a date format and not strings as well as I assumed that your indicator columns were numeric.
Basically, it uses the lookup power of hash tables to obtain the start date of the unique major study for each MBR_SYS_ID (if there is one) and puts it in the data vector to allow computation. This would give you multiple rows for a given MBR_SYS_ID if there had been more than one of the 4 columns with 1 for the same MBR_SYS_ID within the following 90 days.
Vince
P.S. When you say "but it did not work". It is best to either give an example of the bad output or a copy of your error log for us to help. For instance, I can't tell from your post if it was a logic or syntax error and reading through someone elses' thoughts/code looking for both is harder. Especially when I don't know your data structure variable types and formats.
A quick glance tells me that
if we =. then we=fst_srvc_dt;
if wb=' ' then wb=fst_srvc_dt;
if we=wb then output;
is basically outputing every single row that didn't have a major study in addition to what should've achieved your desired result in the big outer do;end; block
I don't believe there is more than 1 occurence per member for the restudies. I tried this but it gave me this error:
It looks like it does not understand the h.definedone:
455 data maj_restudy2;
456 if _N_=1 then do;
457 declare hash h(datasets: "maj_restudy(where=(prov_tin_maj_restudy=1)
457! rename=(fst_srvc_dt=major_study_start_date))");
ERROR: Unknown argument tag datasets.
458 h.defineKey('mbr_sys_id');
459 h.defineData(major_study_start_date);
460 h.defineDone;
-------------
559 22
76
ERROR 559-185: Invalid object attribute reference h.defineDone.
ERROR 22-322: Syntax error, expecting one of the following: (, +, =.
ERROR 76-322: Syntax error, statement will be ignored.
461 end;
462 set maj_restudy;
463 rc=h.find();
ERROR: DATA STEP Component Object failure. Aborted during the COMPILATION phase.
NOTE: The SAS System stopped processing this step because of errors.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.01 seconds
464 if rc=0 then do;
465 if (1 LE fst_srvc_dt - major_study_start_date LE 90) and (nat_perf=1 or nat_cath=1 or
465! nat_pet=1 or nat_ct=1) then do
466 if nat_perf=1 then sub_table="MAJPERF";
467 else if nat_cath=1 then sub_table="MAJCATH";
468 else if nat_pet=1 then sub_table="MAJPET";
469 else if nat_ct=1 then sub_table="MAJCT";
470 output;
471 end;
472 end;
473 drop rc major_study_start_date;
474 run;
my bad, it's <
(dataset:
and not
(datasets:
updating original post.
It still does not like that h.definedone; statement
Sigh, sorry, updated my reply again.
defineDone is a method of the hash table and SAS requires () syntax at the end
h.defineDone();
I always omit it by mistake when I don't have data to test my syntax.
Edit:
If this can be of any additionnal diagnosis of your original post code. Since you are not keeping a new variable for your MBR_SYS_ID each time you find a major study case, your do pt=_N_+1 ... end; statement loops on all studies content that is sorted below your current mbr_sys_id so if some studies overlap, each major study will also cause other studies' procedures that occured within the appropriate timeframe to be output including even the record for a new study if it has an associated procedure applied that day.
Awesome. That worked like a charm
Besides Hash() approach, Proc SQL is also handy for your problem:
proc sql;
select distinct a.* from have a ,
(select MBR_SYS_ID, FST_SRVC_DT from have where Prov_tin_maj_restudy=1) b
where a.MBR_SYS_ID=b.MBR_SYS_ID
and b.FST_SRVC_DT < a.FST_SRVC_DT < b.FST_SRVC_DT+90
and sum( nat_perf, nat_cath, nat_pet, nat_ct) >=1;
quit;
Haikuo
Update: Sub-query is not needed since the condition is straightforward.
proc sql;
select distinct a.* from have a ,
have(where= (Prov_tin_maj_restudy=1))b
where a.MBR_SYS_ID=b.MBR_SYS_ID
and b.FST_SRVC_DT < a.FST_SRVC_DT < b.FST_SRVC_DT+90
and sum( a.nat_perf, a.nat_cath, a.nat_pet, a.nat_ct) >=1;
quit;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.