BookmarkSubscribeRSS Feed
Cruise
Ammonite | Level 13

Hi SAS experts,

Since my ultimate goal is to calculate survival time (date of death-date of diagnosis)  I can't afford to ignore missing month and day of diagnosis. Year of diagnosis variable has no missing.  'date of death' variable is complete and missing for patients alive. The goal here is to get the job done but nothing complex., like involving multiple imputation et.c.

 

My question is: 

I find my approach below does the job but too sloppy and many reducible lines, I guess. There must be more efficient way writing this code. Right? All the conditions needed to be accounted are specified before %randbetween.

 

Thanks a lot.  

 

data have; 
input id date_of_diagnosis_dd date_of_diagnosis_mm date_of_diagnosis_yyyy date_of_death_dd date_of_death_mm date_of_death_yyyy;
cards;
1	99	8	2004    .   .   .  
2	99	99	2010	1	3	2013
3	99	99	2014	21	4	2014
4	99	99	2008	25	10	2008
5	99	99	2012    .   .   .
6	99	99	2010    .   .   .
4	99	2	2008	25	10	2008
;
	

%macro RandBetween(min, max);
   (&min + floor((1+&max-&min)*rand("uniform")))
%mend;

data have1; set have; 
do i=1 to 12; 
if date_of_death_mm ne . and date_of_diagnosis_yyyy=date_of_death_yyyy and date_of_diagnosis_mm=99  
then date_of_diagnosis_mm = %RandBetween(1, date_of_death_mm);
end;
drop i; 
run; 

data have2; set have1;
do i=1 to 12;
if date_of_death_mm ne . and date_of_diagnosis_yyyy ne date_of_death_yyyy and date_of_diagnosis_mm=99  
then date_of_diagnosis_mm = %RandBetween(1,12);
end;
drop i; 
run; 

data have3; set have2;
do i=1 to 12; 
if date_of_diagnosis_mm=99 
then date_of_diagnosis_mm = %RandBetween(1,12); 
end;
drop i; 
run; 

data have4; set have3;
do i=1 to 12; 
if date_of_diagnosis_dd=99 and date_of_diagnosis_mm ne 2 
then date_of_diagnosis_dd = %RandBetween(1,30); 
end;
if date_of_diagnosis_dd=99 and date_of_diagnosis_mm=2 
then date_of_diagnosis_dd = %RandBetween(1,28); 
drop i; 
run; 

proc freq data=have4;
tables date_of_diagnosis_dd date_of_diagnosis_mm;
run; 

data have5; set have4;
date_of_diagnosis=mdy(date_of_diagnosis_mm,date_of_diagnosis_dd,date_of_diagnosis_yyyy);
date_of_death=mdy(date_of_death_mm, date_of_death_dd, date_of_death_yyyy);
date_of_death1=date_of_death;
if date_of_death1=. then date_of_death1=20453;
survival_duration=sum(date_of_death1-date_of_diagnosis);
run; 


 

16 REPLIES 16
Kurt_Bremser
Super User

Why do you discard 11 results of the rand function and keep only every 12th? Because that's the only thing your do loops do.

 

Correction: since the first iteration of the do loop changes the month from 99 to something else, the next 11 iterations won't do anything.&

 

Cruise
Ammonite | Level 13

@Kurt_Bremser

 

I just changed 'do i=1 to 12' to 'do i=1 to 31' when it comes to rand function for date_of_diagnosis_dd.

Hope that solves the problem.

 

do i=1 to 31; 
if date_of_diagnosis_dd=99 and date_of_diagnosis_mm ne 2 
then date_of_diagnosis_dd = %RandBetween(1,30); 
end;
if date_of_diagnosis_dd=99 and date_of_diagnosis_mm=2 
then date_of_diagnosis_dd = %RandBetween(1,28); 
drop i;
Kurt_Bremser
Super User

@Cruise wrote:

@Kurt_Bremser

 

I just changed 'do i=1 to 12' to 'do i=1 to 31' when it comes to rand function for date_of_diagnosis_dd.

Hope that solves the problem.

 

do i=1 to 31; 
if date_of_diagnosis_dd=99 and date_of_diagnosis_mm ne 2 
then date_of_diagnosis_dd = %RandBetween(1,30); 
end;
if date_of_diagnosis_dd=99 and date_of_diagnosis_mm=2 
then date_of_diagnosis_dd = %RandBetween(1,28); 
drop i;

Only makes it worse. After iteration 1, date_of_diagnosis_dd won't be 99 anymore, so iteration 2 to 31 is just wasted time.

Cruise
Ammonite | Level 13
Please show me how if you will. I'm not familiar to Rand.
Kurt_Bremser
Super User

In the first iteration of the do loop, you set the dd variable to something else if it is 99. Since then the check for 99 cannot be true, the other iterations of the do loop do nothing.

Cruise
Ammonite | Level 13

@Astounding

I have this condition set in the first code block when diagnosis and date happens in same year, month of diagnosis would not take value greater than month of death.

 

do i=1 to 12; 
if date_of_death_mm ne . and date_of_diagnosis_yyyy=date_of_death_yyyy and date_of_diagnosis_mm=99  
then date_of_diagnosis_mm = %RandBetween(1, date_of_death_mm);
end;
Kurt_Bremser
Super User

Run this:

data _null_;
date_of_death_mm = 11;
date_of_diagnosis_yyyy = 2017;
date_of_death_yyyy = 2017;
date_of_diagnosis_mm = 99;
do i=1 to 12; 
if date_of_death_mm ne . and date_of_diagnosis_yyyy=date_of_death_yyyy and date_of_diagnosis_mm=99  
then date_of_diagnosis_mm = %RandBetween(1, date_of_death_mm);
  put i=;
  put date_of_diagnosis_mm=;
end;
date_of_diagnosis_mm = 11;
do i=1 to 12; 
if date_of_death_mm ne . and date_of_diagnosis_yyyy=date_of_death_yyyy and date_of_diagnosis_mm=99  
then date_of_diagnosis_mm = %RandBetween(1, date_of_death_mm);
  put i=;
  put date_of_diagnosis_mm=;
end;
run;

and look at the log.

Cruise
Ammonite | Level 13

Oh no, month of diagnosis is taking same value of month of death 😞

Astounding
PROC Star

You may need a more rigorous strategy before trying to program this.  If month/day of diagnosis is missing, you might still encounter year of diagnosis equal to year of death.  Randomly selecting the month/day of diagnosis could lead to the diagnosis being later than death.

Reeza
Super User

If you have SAS 9.4 I think you have access to the RAND Integer option, rather than your randbetween macro. 

 

Integer Distribution

x=RAND('INTEGER',a,<b>)
Arguments

x

is a random value from the discrete uniform distribution on a finite set of integers. If you specify one integer parameter, a, then x is drawn uniformly at random from the set {1,2,…, a–1, a}. If you specify two integer parameters, a and b with a ≤ b, then x is drawn uniformly at random from the set {a, a+1,…, b–1, b}.

a

is an integer parameter. If you specify only one numeric parameter, a is an upper limit for the random values. If you specify two parameters, a is a lower limit.

b

is an integer parameter that specifies the upper limit for the random values.

Cruise
Ammonite | Level 13

@Reeza

Integer documentation said, it requires SAS 9.4 M4 but mine is only 9.4 M1. I tried 'integer' and it failed.

sas version.png

Cruise
Ammonite | Level 13

@Kurt_Bremser

Hi Kurt, how about to physically subset the data to avoid 'if' conditions. and run RAND over individual datasets to stack afterwards?

Kurt_Bremser
Super User

@Cruise wrote:

@Kurt_Bremser

Hi Kurt, how about to physically subset the data to avoid 'if' conditions. and run RAND over individual datasets to stack afterwards?


The problem are the do loops that serve no purpose (as demonstrated). Remove them. And you can do all the corrective measures in one data step.

Cruise
Ammonite | Level 13

@pau13rown

I ended up going the direction you suggested. Generating random months would cause too much variation when my survival duration unit is measured by days. Instead, I'm linking my data with missing dates to a diagnosis file of Medicaid and see if I can get some hints of diagnosis dates there.

Thanks Paul!

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 16 replies
  • 1324 views
  • 5 likes
  • 5 in conversation