Ammonite | Level 13

## Coding efficient using multiple conditions with random number generation

Hi SAS experts,

Since my ultimate goal is to calculate survival time (date of death-date of diagnosis)  I can't afford to ignore missing month and day of diagnosis. Year of diagnosis variable has no missing.  'date of death' variable is complete and missing for patients alive. The goal here is to get the job done but nothing complex., like involving multiple imputation et.c.

My question is:

I find my approach below does the job but too sloppy and many reducible lines, I guess. There must be more efficient way writing this code. Right? All the conditions needed to be accounted are specified before %randbetween.

Thanks a lot.

``````data have;
input id date_of_diagnosis_dd date_of_diagnosis_mm date_of_diagnosis_yyyy date_of_death_dd date_of_death_mm date_of_death_yyyy;
cards;
1	99	8	2004    .   .   .
2	99	99	2010	1	3	2013
3	99	99	2014	21	4	2014
4	99	99	2008	25	10	2008
5	99	99	2012    .   .   .
6	99	99	2010    .   .   .
4	99	2	2008	25	10	2008
;

%macro RandBetween(min, max);
(&min + floor((1+&max-&min)*rand("uniform")))
%mend;

data have1; set have;
do i=1 to 12;
if date_of_death_mm ne . and date_of_diagnosis_yyyy=date_of_death_yyyy and date_of_diagnosis_mm=99
then date_of_diagnosis_mm = %RandBetween(1, date_of_death_mm);
end;
drop i;
run;

data have2; set have1;
do i=1 to 12;
if date_of_death_mm ne . and date_of_diagnosis_yyyy ne date_of_death_yyyy and date_of_diagnosis_mm=99
then date_of_diagnosis_mm = %RandBetween(1,12);
end;
drop i;
run;

data have3; set have2;
do i=1 to 12;
if date_of_diagnosis_mm=99
then date_of_diagnosis_mm = %RandBetween(1,12);
end;
drop i;
run;

data have4; set have3;
do i=1 to 12;
if date_of_diagnosis_dd=99 and date_of_diagnosis_mm ne 2
then date_of_diagnosis_dd = %RandBetween(1,30);
end;
if date_of_diagnosis_dd=99 and date_of_diagnosis_mm=2
then date_of_diagnosis_dd = %RandBetween(1,28);
drop i;
run;

proc freq data=have4;
tables date_of_diagnosis_dd date_of_diagnosis_mm;
run;

data have5; set have4;
date_of_diagnosis=mdy(date_of_diagnosis_mm,date_of_diagnosis_dd,date_of_diagnosis_yyyy);
date_of_death=mdy(date_of_death_mm, date_of_death_dd, date_of_death_yyyy);
date_of_death1=date_of_death;
if date_of_death1=. then date_of_death1=20453;
survival_duration=sum(date_of_death1-date_of_diagnosis);
run;

``````

16 REPLIES 16
Super User

## Re: Coding efficient using multiple conditions with random number generation

Why do you discard 11 results of the rand function and keep only every 12th? Because that's the only thing your do loops do.

Correction: since the first iteration of the do loop changes the month from 99 to something else, the next 11 iterations won't do anything.&

Ammonite | Level 13

## Re: Coding efficient using multiple conditions with random number generation

@Kurt_Bremser

I just changed 'do i=1 to 12' to 'do i=1 to 31' when it comes to rand function for date_of_diagnosis_dd.

Hope that solves the problem.

``````do i=1 to 31;
if date_of_diagnosis_dd=99 and date_of_diagnosis_mm ne 2
then date_of_diagnosis_dd = %RandBetween(1,30);
end;
if date_of_diagnosis_dd=99 and date_of_diagnosis_mm=2
then date_of_diagnosis_dd = %RandBetween(1,28);
drop i;
``````
Super User

## Re: Coding efficient using multiple conditions with random number generation

@Cruise wrote:

@Kurt_Bremser

I just changed 'do i=1 to 12' to 'do i=1 to 31' when it comes to rand function for date_of_diagnosis_dd.

Hope that solves the problem.

``````do i=1 to 31;
if date_of_diagnosis_dd=99 and date_of_diagnosis_mm ne 2
then date_of_diagnosis_dd = %RandBetween(1,30);
end;
if date_of_diagnosis_dd=99 and date_of_diagnosis_mm=2
then date_of_diagnosis_dd = %RandBetween(1,28);
drop i;
``````

Only makes it worse. After iteration 1, date_of_diagnosis_dd won't be 99 anymore, so iteration 2 to 31 is just wasted time.

Ammonite | Level 13

## Re: Coding efficient using multiple conditions with random number generation

Please show me how if you will. I'm not familiar to Rand.
Super User

## Re: Coding efficient using multiple conditions with random number generation

In the first iteration of the do loop, you set the dd variable to something else if it is 99. Since then the check for 99 cannot be true, the other iterations of the do loop do nothing.

Ammonite | Level 13

## Re: Coding efficient using multiple conditions with random number generation

@Astounding

I have this condition set in the first code block when diagnosis and date happens in same year, month of diagnosis would not take value greater than month of death.

``````do i=1 to 12;
if date_of_death_mm ne . and date_of_diagnosis_yyyy=date_of_death_yyyy and date_of_diagnosis_mm=99
then date_of_diagnosis_mm = %RandBetween(1, date_of_death_mm);
end;``````
Super User

## Re: Coding efficient using multiple conditions with random number generation

Run this:

``````data _null_;
date_of_death_mm = 11;
date_of_diagnosis_yyyy = 2017;
date_of_death_yyyy = 2017;
date_of_diagnosis_mm = 99;
do i=1 to 12;
if date_of_death_mm ne . and date_of_diagnosis_yyyy=date_of_death_yyyy and date_of_diagnosis_mm=99
then date_of_diagnosis_mm = %RandBetween(1, date_of_death_mm);
put i=;
put date_of_diagnosis_mm=;
end;
date_of_diagnosis_mm = 11;
do i=1 to 12;
if date_of_death_mm ne . and date_of_diagnosis_yyyy=date_of_death_yyyy and date_of_diagnosis_mm=99
then date_of_diagnosis_mm = %RandBetween(1, date_of_death_mm);
put i=;
put date_of_diagnosis_mm=;
end;
run;``````

and look at the log.

Ammonite | Level 13

## Re: Coding efficient using multiple conditions with random number generation

Oh no, month of diagnosis is taking same value of month of death 😞

PROC Star

## Re: Coding efficient using multiple conditions with random number generation

You may need a more rigorous strategy before trying to program this.  If month/day of diagnosis is missing, you might still encounter year of diagnosis equal to year of death.  Randomly selecting the month/day of diagnosis could lead to the diagnosis being later than death.

Super User

## Re: Coding efficient using multiple conditions with random number generation

If you have SAS 9.4 I think you have access to the RAND Integer option, rather than your randbetween macro.

### Integer Distribution

x=RAND('INTEGER',a,<b>)
Arguments

#### x

is a random value from the discrete uniform distribution on a finite set of integers. If you specify one integer parameter, a, then x is drawn uniformly at random from the set {1,2,…, a–1, a}. If you specify two integer parameters, a and b with a ≤ b, then x is drawn uniformly at random from the set {a, a+1,…, b–1, b}.

#### a

is an integer parameter. If you specify only one numeric parameter, a is an upper limit for the random values. If you specify two parameters, a is a lower limit.

#### b

is an integer parameter that specifies the upper limit for the random values.

Ammonite | Level 13

## Re: Coding efficient using multiple conditions with random number generation

@Reeza

Integer documentation said, it requires SAS 9.4 M4 but mine is only 9.4 M1. I tried 'integer' and it failed.

Ammonite | Level 13

## Re: Coding efficient using multiple conditions with random number generation

@Kurt_Bremser

Hi Kurt, how about to physically subset the data to avoid 'if' conditions. and run RAND over individual datasets to stack afterwards?

Super User

## Re: Coding efficient using multiple conditions with random number generation

@Cruise wrote:

@Kurt_Bremser

Hi Kurt, how about to physically subset the data to avoid 'if' conditions. and run RAND over individual datasets to stack afterwards?

The problem are the do loops that serve no purpose (as demonstrated). Remove them. And you can do all the corrective measures in one data step.

Ammonite | Level 13

## Re: Coding efficient using multiple conditions with random number generation

@pau13rown

I ended up going the direction you suggested. Generating random months would cause too much variation when my survival duration unit is measured by days. Instead, I'm linking my data with missing dates to a diagnosis file of Medicaid and see if I can get some hints of diagnosis dates there.

Thanks Paul!

Discussion stats
• 16 replies
• 1614 views
• 5 likes
• 5 in conversation