BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Cruise
Ammonite | Level 13

Hi Folks:

 

I'm trying to create random dates to replace incomplete dates with missing month and/or day. Year fields are complete. I used %Rand macro from Rick's tutorial https://blogs.sas.com/content/iml/2015/10/05/random-integers-sas.html. Conditions for the random selection include dates be not after censoring date and must be within the known year of diagnosis.

 

Could you please look at my SAS code below on the mock data where I created RAND_DX. Any critiques or suggestions or improvements appreciated to achieve my programming objective I stated above.

 

Thank you for taking your time in this.

 

My variables:

Date (dx_month, dx_day and dx_year)

Censor (censor_month, censor_day, censor_year)

 

Assumptions/facts:

  1. All records have ‘year’ of diagnosis (no missing in the year of diagnosis)
  2. ‘day’ field is unknown when ‘month’ field is unknown
  3. ‘day’ field can be unknown when ‘month’ field is known  
  4. Effect of leap year and February is negligible
  5. Variable ‘censor’ is complete

 

DATA HAVE;
INPUT ID dx_year dx_month dx_day CENSOR CENSOR_DAY CENSOR_MONTH CENSOR_YEAR;
CARDS;
1	2005	99	99	20088	31	12	2014
2	2005	99	99	17308	22	5	2007
3	2005	99	99	16521	26	3	2005
4	2005	99	99	17074	30	9	2006
5	2005	99	99	16620	3	7	2005
6	2005	99	99	16709	30	9	2005
7	2005	99	99	16901	10	4	2006
8	2005	99	99	16777	7	12	2005
9	2005	99	99	16763	23	11	2005
10	2005	99	99	17750	6	8	2008
11	2005	99	99	16621	4	7	2005
12	2005	99	99	20088	31	12	2014
13	2005	99	99	16636	19	7	2005
14	2005	99	99	16685	6	9	2005
15	2005	99	99	16778	8	12	2005
16	2005	99	99	17129	24	11	2006
17	2005	99	99	20088	31	12	2014
18	2005	99	99	17057	13	9	2006
19	2005	99	99	16486	19	2	2005
20	2005	99	99	16548	22	4	2005
;

%macro RandBetween(min, max);
   (&min + floor((1+&max-&min)*rand("uniform")))
%mend;

DATA HAVE1; SET HAVE; 
RAND_DX_MONTH = %RandBetween(1,CENSOR_MONTH); 
RAND_DX_DAY= %RandBetween(1,CENSOR_DAY);
RAND_DX=MDY(RAND_DX_MONTH, RAND_DX_DAY, DX_YEAR); 
run; 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User

SAS has the rand('integer', ....) function now so you don't need that macro unless you're on an older version of SAS.

 

Dates are integers. So you can calculate the earliest possible date for the 'event' and latest possible date and select a random value in-between them. 

 

Your current methodology is more restrictive than this approach, for example, you can't have the event be within the same month as the censoring, which is possible. You may have a longer gap required based on what exactly you're measuring - diagnosis to censoring may be always at least 60 days or something like that.

 

For the day portion of the date, what happens if the day of censoring is the first of the month? From your current method that would leave a single day that can be chosen. It does not need to be less than the day, this is a parameter that needs some relationship to the month as well otherwise it's adding an additional restriction that may influence your data but not be accurate at all. 

 

I haven't bothered to set any seeds, I'll leave that to you. I added in a check below to ensure that the event date is always before the censor date. 

 

DATA HAVE;
    INPUT ID dx_year dx_month dx_day CENSOR CENSOR_DAY CENSOR_MONTH CENSOR_YEAR;

    CARDS;
1  2005    12  99  20088   31  12  2014
2  2005    1  99  17308   22  5   2007
3  2005    4  99  16521   26  3   2005
4  2005    99  99  17074   30  9   2006
5  2005    99  99  16620   3   7   2005
6  2005    99  99  16709   30  9   2005
7  2005    99  99  16901   10  4   2006
8  2005    99  99  16777   7   12  2005
9  2005    99  99  16763   23  11  2005
10 2005    99  99  17750   6   8   2008
11 2005    99  99  16621   4   7   2005
12 2005    99  99  20088   31  12  2014
13 2005    99  99  16636   19  7   2005
14 2005    99  99  16685   6   9   2005
15 2005    99  99  16778   8   12  2005
16 2005    99  99  17129   24  11  2006
17 2005    99  99  20088   31  12  2014
18 2005    99  99  17057   13  9   2006
19 2005    99  99  16486   19  2   2005
20 2005    99  99  16548   22  4   2005
;

data want;
    set have;
    
    *calculate censor date;
    censor_date=mdy(censor_month, censor_day, censor_year);

    *handle missing values for both day and month;
    if dx_month = 99 and dx_day = 99 then
        do;
            *first possible date is the beginning of the year;
            first_possible_date=intnx('year', censor_date, 0, 'b');
            *last possible date is the day before censoring;
            last_possible_date=censor_date - 1;
            *event date is a random date between these dates;
            event_date=rand('integer', first_possible_date, last_possible_date);
            *assign day and months as needed;
            dx_month=month(event_date);
            dx_day=day(event_date);
        end;
    else if dx_day eq 99 and dx_month ne 99 then
        do;
            *first possible date is beginning of month indicated;
            first_possible_date=mdy(censor_month, 1, censor_year);
            *last possible date is last of month;
            *EXCEPT if month is same as censor month;
            *in that case, use the earlier of the two dates;
            last_possible_date=min(intnx('month', first_possible_date, 0, 'e'), 
                                   censor_date - 1);
            *calculate a random event date;
            event_date=rand('integer', first_possible_date, last_possible_date);
            *assign day as needed;
            dx_day=day(event_date);
        end;
    else
        event_date=mdy(dx_month, dx_day, dx_year);
    *format dates for display and legibility;
    format censor event_date first_possible_date last_possible_date date9.;
    
    check = censor_date - event_date;
run;

PS.....if this is going to be published I'm going to want my name on the paper 😉

View solution in original post

4 REPLIES 4
Reeza
Super User

None of the records in your sample data set meet this sitatuion/assumption:

 


@Cruise wrote:

Hi Folks:

 

I'm trying to create random dates to replace incomplete dates with missing month and/or day. Year fields are complete. I used %Rand macro from Rick's tutorial https://blogs.sas.com/content/iml/2015/10/05/random-integers-sas.html. Conditions for the random selection include dates be not after censoring date and must be within the known year.

 

Could you please look at my SAS code below on the mock data where I created RAND_DX. Any critiques or suggestions or improvements appreciated to achieve my programming objective I stated above.

 

Thank you for taking your time in this.

 

My variables:

Date (dx_month, dx_day and dx_year)

Censor (censor_month, censor_day, censor_year)

 

Assumptions/facts:

  1. All records have ‘year’ of date (no missing)
  2. ‘day’ field is unknown when ‘month’ field is unknown
  3. ‘day’ field can be unknown when ‘month’ field is known  
  4. Effect of leap year and February is negligible
  5. Variable ‘censor’ is complete

 

Reeza
Super User

SAS has the rand('integer', ....) function now so you don't need that macro unless you're on an older version of SAS.

 

Dates are integers. So you can calculate the earliest possible date for the 'event' and latest possible date and select a random value in-between them. 

 

Your current methodology is more restrictive than this approach, for example, you can't have the event be within the same month as the censoring, which is possible. You may have a longer gap required based on what exactly you're measuring - diagnosis to censoring may be always at least 60 days or something like that.

 

For the day portion of the date, what happens if the day of censoring is the first of the month? From your current method that would leave a single day that can be chosen. It does not need to be less than the day, this is a parameter that needs some relationship to the month as well otherwise it's adding an additional restriction that may influence your data but not be accurate at all. 

 

I haven't bothered to set any seeds, I'll leave that to you. I added in a check below to ensure that the event date is always before the censor date. 

 

DATA HAVE;
    INPUT ID dx_year dx_month dx_day CENSOR CENSOR_DAY CENSOR_MONTH CENSOR_YEAR;

    CARDS;
1  2005    12  99  20088   31  12  2014
2  2005    1  99  17308   22  5   2007
3  2005    4  99  16521   26  3   2005
4  2005    99  99  17074   30  9   2006
5  2005    99  99  16620   3   7   2005
6  2005    99  99  16709   30  9   2005
7  2005    99  99  16901   10  4   2006
8  2005    99  99  16777   7   12  2005
9  2005    99  99  16763   23  11  2005
10 2005    99  99  17750   6   8   2008
11 2005    99  99  16621   4   7   2005
12 2005    99  99  20088   31  12  2014
13 2005    99  99  16636   19  7   2005
14 2005    99  99  16685   6   9   2005
15 2005    99  99  16778   8   12  2005
16 2005    99  99  17129   24  11  2006
17 2005    99  99  20088   31  12  2014
18 2005    99  99  17057   13  9   2006
19 2005    99  99  16486   19  2   2005
20 2005    99  99  16548   22  4   2005
;

data want;
    set have;
    
    *calculate censor date;
    censor_date=mdy(censor_month, censor_day, censor_year);

    *handle missing values for both day and month;
    if dx_month = 99 and dx_day = 99 then
        do;
            *first possible date is the beginning of the year;
            first_possible_date=intnx('year', censor_date, 0, 'b');
            *last possible date is the day before censoring;
            last_possible_date=censor_date - 1;
            *event date is a random date between these dates;
            event_date=rand('integer', first_possible_date, last_possible_date);
            *assign day and months as needed;
            dx_month=month(event_date);
            dx_day=day(event_date);
        end;
    else if dx_day eq 99 and dx_month ne 99 then
        do;
            *first possible date is beginning of month indicated;
            first_possible_date=mdy(censor_month, 1, censor_year);
            *last possible date is last of month;
            *EXCEPT if month is same as censor month;
            *in that case, use the earlier of the two dates;
            last_possible_date=min(intnx('month', first_possible_date, 0, 'e'), 
                                   censor_date - 1);
            *calculate a random event date;
            event_date=rand('integer', first_possible_date, last_possible_date);
            *assign day as needed;
            dx_day=day(event_date);
        end;
    else
        event_date=mdy(dx_month, dx_day, dx_year);
    *format dates for display and legibility;
    format censor event_date first_possible_date last_possible_date date9.;
    
    check = censor_date - event_date;
run;

PS.....if this is going to be published I'm going to want my name on the paper 😉

Cruise
Ammonite | Level 13

Thank you Reeza! It's such a thorough approach!

 

Year of diagnosis is known for all records (dx_year). The goal is to impute missing month and/or missing day within the known year of diagnosis. Therefore, the first possible date is the beginning of the year of diagnosis instead the year of censoring. 

 

Sorry for not including only-day missing scenario in the mock data. In the inclusive data, ID=3 was censored (03/26/2005) prior to a diagnosis (04/99/2005).

Cruise
Ammonite | Level 13

@Reeza 

Below is the closest I got using your approach. Ignore, negative survival for ID=3. Please comment if you find another problem. 

 

data have1; set have;
if dx_month = 99 and dx_day = 99 then fake_dx=mdy(1,1,dx_year);
else 
if dx_day eq 99 and dx_month ne 99 then
fake_dx=mdy(dx_month,1,dx_year);
run; 

data want;
    set have1; 
    *calculate censor date;
    censor_date=mdy(censor_month, censor_day, censor_year);

    *handle missing values for both day and month;
    if dx_month = 99 and dx_day = 99 then
        do;
   *first possible date is the beginning of the known year of diagnosis;
              first_possible_date=intnx('year', fake_dx, 0, 'b');
   *last possible date is the earlier of the end of the known year of diagnosis or censoring date;
	     last_possible_date=min(intnx('year', first_possible_date, 0, 'e'), 
                                   censor_date - 1);
   *event date is a random date between these dates;
              event_date=rand('integer', first_possible_date, last_possible_date);
   *assign day and months as needed;
              dx_month=month(event_date);
              dx_day=day(event_date);
        end;
    else if dx_day eq 99 and dx_month ne 99 then
        do;
            *first possible date is beginning of the known month of diagnosis;
            first_possible_date=mdy(dx_month, 1, dx_year);
            *last possible date is last day of the of month;
            *EXCEPT if month is same as censor month;
            *in that case, use the earlier of the two dates;
            last_possible_date=min(intnx('month', first_possible_date, 0, 'e'), 
                                   censor_date - 1);
            *calculate a random event date;
            event_date=rand('integer', first_possible_date, last_possible_date);
            *assign day as needed;
            dx_day=day(event_date);
        end;
    else
        event_date=mdy(dx_month, dx_day, dx_year);
    *format dates for display and legibility;
    format censor event_date first_possible_date last_possible_date fake_dx date9.;
    check = censor_date - event_date;
run;

proc print data=want(drop=censor_:);
run;

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 918 views
  • 3 likes
  • 2 in conversation