Hi @novinosrin thank you very much! Pardon my ignorance, but how would I use the datastep approach to specify the diagnoses that I want to run the code for? For instance, I am only interested in identifying a select group of diagnoses that meet this condition- let's say diagnosis= A or B or C.
well just subset using a filter( where diagnosis in ('A','B','C') ) and run the test against the subset
Hi @wj2 Please let us know the progress and mark the solution that you deem made you progress. Such encouragement helps up to keep playing this beautiful video game called SAS
@novinosrin Yes, thank you. I think the approach you suggested will be the way to go and I can mark it. I added the subset statement for specifying the variables as shown below; however, the issue that I am running into now is that after I run the code I am receiving an error message indicating there is insufficient memory to execute the data step program- also see below. The data set is very large and I will keep looking into this. Any suggestions regarding this matter would certainly be welcome as well.
data want;
if _n_=1 then do;
if 0 then set have(rename=(diagnosis_date=d));
dcl hash H (dataset:'have(rename=(diagnosis_date=d))',multidata:'y') ;
h.definekey ("Patient_ID","diagnosis") ;
h.definedata ("d") ;
h.definedone () ;
end;
set _have;
c=0;
by Patient_ID diagnosis ;
where diagnosis in ('A','B','C');
if first.Patient_ID then c1=0;
do rc=h.find() by 0 while(rc=0);
if d>diagnosis_date and 90<=intck('days',diagnosis_date,d)<=365 then c+1;
rc=h.find_next();
end;
c1+c;
drop rc d;
run;
FATAL: Insufficient memory to execute DATA step program. Aborted during the EXECUTION phase.
ERROR: The SAS System stopped processing this step because of insufficient memory.
Jeez @wj2 How I wish that was never an issue. Good lord almighty. Well, We will have to move away from hash as this is not the time for us to dig in and see how to fit all the data in memory or clear memory.
Let me work on a non hash solution. Don't mark anything yet as the thread is still open and not solved
Hi @wj2 Not knowing your resources aka(Disk, memory et al) I am going with simple two step approach. Nothing fancy or classy but will get the job done.
/*Step 1 -Get all possible look ahead dates for the condition*/
proc sql;
create table temp as
select distinct a.*,b.diagnosis_date as bdiagnosis_date
from have a left join have b
on a.Patient_ID=b.Patient_ID and a.diagnosis=b.diagnosis and b.diagnosis_date>a.diagnosis_date
group by a.patient_id
order by a.Patient_ID,a.diagnosis,diagnosis_date;
quit;
/*Step 2-Conditional count having for the complete dates from the previous*/
data want;
set temp;
by Patient_ID diagnosis diagnosis_date;
if first.diagnosis_date then c=0;
if first.Patient_ID then c1=0;
if bdiagnosis_date>. and 90<=intck('days',diagnosis_date,bdiagnosis_date)<=365 then c+1;
c1+c;
if last.diagnosis_date;
drop bdiagnosis_date;
run;
So ideally I should not propose a Hash solution when I see your post for insufficient memory? 🙂 No worries, I will keep that in mind.
@wj2 Did you try the above?
Hi @novinosrin and @wj2
Here is my "fwiw" on this one.
I havent had the chance to see all of the parts of the suggested code - but some look great. I would agree that its useful and good code.
But - my only observation and recommendation to avoid "code going down on space" - is subset down to just the rows you need.
Its not been clearly apparent in this - but with solid suggestions on code (with examples) that to me look good and should work - i'd have to ask - why are we putting EVERY SINGLE ROW THRU? If that is the case, lets change that asap.
The examples i've seen - should be functional once we subset down to just the rows per: A, B, C of what we are interested in for a "diag" set of conditions.
Here is why I'm mentioning the subset part. I am not able to remember if there was mention of this being run on VRDC data.
Just that its "lots of data". I'm very familiar with claims data and specifically the VRDC by CMS.
That system and SAS environment has TONS of HUGE datasets with claims, diagnosis, procedures.
In that realm - i've been constantly hit with SQL issues and Hash table issues as well. Its moved most of my work towards data step and very efficient methods within that.
My preference to solve this kind of issue with a blend of FORMATS and Datastep - come from an acute familiarity with the performance of those on the VRDC when put against that kind of data and data layout.
So i'm not trying to argue the merits of SQL steps vs Data steps. As a long time SAS fan- i use both.
So - if this is being attempted on the VRDC - be aware of my "imho/fwiw".
I'd also say that issues with space etc when using Hash/SQL - are indicative of a need to optimize and configure your environment a bit better - as well as consider your code approach if those maintenance topics cant easily be addressed by you or someone on your team (as in the VRDC).
Welp... thats my imho.
Everyone have a great weekend!
i'll be at SGF2019 with papers to present. One touches on the VRDC. Not exactly this topic but has information on the VRDC
Best,
Zeke
All the best.
Hi @novinosrin @zekeT_sasaholic @ballardw, thank you all so much for your time! The suggested code ended up working!
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.