BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
SM1
Obsidian | Level 7 SM1
Obsidian | Level 7

Hello,

 

I'm working with data that includes up to 5 event date & time variables per row. Missing data is an issue. The 5 events should have occurred in chronological order, (e.g., Event1 before Event2 before Event3 . . .). For each event, there may be a SAS date variable only, a SAS time variable only, both variables or missing data.

 

I need to flag observations if the events are not in chronological order.  For observations with complete information for each event, this is not a problem.  I can't figure out how to work with rows that have missing data. I would like to generate 2 variables: (1) flag if there is good chronological order for the available date & time variables and (b) flag for good chronological order for the available SAS dates only.

 

Any suggestions will be most appreciated.

 

Sample data:

 

ID     E1Date      E1Time   E2Date       E2Time   E3Date      E3Time    E4Date       E4Time    E5Date      E5Time 

1      05JAN15    1715       05JAN15     2130      05JAN15    2355        06JAN15     0100       06JAN15    0300 

2      15MAY11   1300                                                                          15MAY11    1523       15MAY11    1633

3      09FEB13                  09FEB13                  09FEB13    0800        09FEB13    1015       09FEB13    1145

4      18JUN17                                                                                      18JUN17                    18JUN17  

 

data want;
data have;
length GoodDateTimeChron $8.;

GoodDateTimeChron = 'NO';

DateTimeE1 = dhms(DateE1, 0, 0, TimeE1);  /*create SAS datetime vars*/
DateTimeE2 = dhms(DateE2, 0, 0, TimeE2);
DateTimeE3 = dhms(Date33, 0, 0, TimeE3);
DateTimeE4 = dhms(DateE4, 0, 0, TimeD4);
DateTimeE5 = dhms(DateE5, 0, 0, TimeE5);
format DateTimeE1 DateTimeE2 DateTimeE3 DateTimeE4 DateTimeE5 datetime20.;

/*this correctly flags row 1 by chronology, but none of the other rows due to missing data*/
if not missing(DateTimeE1) >=not missing(DateTimeE2) >=not missing(DateTimeE3) >=not missing(DateTimeE4) >=not missing(DateTimeE5)  then GoodDateTimeChron = 'YES'; 

/*this doesn't directly flag for chronology, but it counts the number of distinct date values, including missing as a distinct value */
array event {5} DateE1 DateE2 DateE3 DateE4 DateE5 ;
array new {5} $20 _temporary_;
do _n_=1 to 5;
new{_n_} = event{_n_};
end;
call sortc(of new{*});
count = (new{1} > ' ');
do _n_=2 to 5;
if new{_n_} ne new{_n_-1} then count + 1;
end;


run;
1 ACCEPTED SOLUTION

Accepted Solutions
gamotte
Rhodochrosite | Level 12

Hello,

 

The following program creates new arrays where all non missing values are consecutive so that there are only

missing values at the end of arrays => The chronology only has to be checked until a missing value is met.

 

data have;
    informat E1Date E2Date E3Date E4Date E5Date date9.;
	format E1Date E2Date E3Date E4Date E5Date date9.;

    input ID E1Date E1Time E2Date E2Time E3Date E3Time E4Date E4Time E5Date E5Time;
    cards;
1 05JAN15 1715 05JAN15 2130 05JAN15 2355 06JAN15 0100 06JAN15 0300 
2 15MAY11 1300 . . . . 15MAY11 1523 15MAY11 1633
3 09FEB13 . 09FEB13 . 09FEB13 0800 09FEB13 1015 09FEB13 1145
4 18JUN17 . . . . . 18JUN17 . 18JUN17 .
;
run;

%macro supp_miss(array_in, array_out);

    i=0;

    call missing(of &array_out.(*));
    do over &array_in.;
        if not missing(&array_in.) then do;
            i+1;
            &array_out.(i)=&array_in;
        end;
    end;

%mend;

%macro check_chronology(array, flag);

    &flag.=1;

    do i=1 to dim(&array.)-1 until(exit);
        if missing(&array.(i+1)) then exit=1;
        else do;
            &flag.=&flag. and (&array.(i+1) ge &array.(i));
        end;
    end;

%mend;

data want;
    set have;
    format dt1-dt5 datetime.;
    array dates E1Date E2Date E3Date E4Date E5Date;
    array times E1Time E2Time E3Time E4Time E5Time;
    array datetimes dt1-dt5;

    array nmissdates(5) _TEMPORARY_;
    array nmissdatetimes(5) _TEMPORARY_;

    i=0;

    do over dates;
        if not missing(dates) and not missing(times) then do;
            datetimes=dhms(dates, 0, 0, times); 
        end;
    end;

    %supp_miss(dates, nmissdates);
    %supp_miss(datetimes, nmissdatetimes);

    %check_chronology(nmissdates,flag_dates);
    %check_chronology(nmissdatetimes,flag_datetimes);
run;

View solution in original post

10 REPLIES 10
mkeintz
PROC Star

If a missing time value occurs between two valid time values, I presume that has no effect on your designation of event order, correct?

 

Could you set up your sample data in a DATA HAVE step?  That would make it a lot easier for us to help you.

 

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
gamotte
Rhodochrosite | Level 12

Hello,

 

The following program creates new arrays where all non missing values are consecutive so that there are only

missing values at the end of arrays => The chronology only has to be checked until a missing value is met.

 

data have;
    informat E1Date E2Date E3Date E4Date E5Date date9.;
	format E1Date E2Date E3Date E4Date E5Date date9.;

    input ID E1Date E1Time E2Date E2Time E3Date E3Time E4Date E4Time E5Date E5Time;
    cards;
1 05JAN15 1715 05JAN15 2130 05JAN15 2355 06JAN15 0100 06JAN15 0300 
2 15MAY11 1300 . . . . 15MAY11 1523 15MAY11 1633
3 09FEB13 . 09FEB13 . 09FEB13 0800 09FEB13 1015 09FEB13 1145
4 18JUN17 . . . . . 18JUN17 . 18JUN17 .
;
run;

%macro supp_miss(array_in, array_out);

    i=0;

    call missing(of &array_out.(*));
    do over &array_in.;
        if not missing(&array_in.) then do;
            i+1;
            &array_out.(i)=&array_in;
        end;
    end;

%mend;

%macro check_chronology(array, flag);

    &flag.=1;

    do i=1 to dim(&array.)-1 until(exit);
        if missing(&array.(i+1)) then exit=1;
        else do;
            &flag.=&flag. and (&array.(i+1) ge &array.(i));
        end;
    end;

%mend;

data want;
    set have;
    format dt1-dt5 datetime.;
    array dates E1Date E2Date E3Date E4Date E5Date;
    array times E1Time E2Time E3Time E4Time E5Time;
    array datetimes dt1-dt5;

    array nmissdates(5) _TEMPORARY_;
    array nmissdatetimes(5) _TEMPORARY_;

    i=0;

    do over dates;
        if not missing(dates) and not missing(times) then do;
            datetimes=dhms(dates, 0, 0, times); 
        end;
    end;

    %supp_miss(dates, nmissdates);
    %supp_miss(datetimes, nmissdatetimes);

    %check_chronology(nmissdates,flag_dates);
    %check_chronology(nmissdatetimes,flag_datetimes);
run;
mkeintz
PROC Star

You don't have to make the non-missing event dates and times contiguous.  You can just conditionally compare an earlier event to a later event when both are non-missing.  The advantage is that you would then preserve the original missing and non-missing events - possibly useful for later analysis:

 

data have;
    informat E1Date E2Date E3Date E4Date E5Date date9.;
	format E1Date E2Date E3Date E4Date E5Date date9.;

    input ID E1Date E1Time E2Date E2Time E3Date E3Time E4Date E4Time E5Date E5Time;
    cards;
1   05JAN15 1715 05JAN15 2130 05JAN15 2355 06JAN15 0100 06JAN15 0300 
2   15MAY11 1300 .       .    .       .    15MAY11 1523 15MAY11 1633
2.1 15MAY11 1523 .       .    .       .    15MAY11 1633 15MAY11 1300
3   09FEB13 .    09FEB13 .    09FEB13 0800 09FEB13 1015 09FEB13 1145
4   18JUN17 .    .       .    .       .    18JUN17 .    18JUN17 .
4.1 19JUN17 .    .       .    .       .    18JUN17 .    18JUN17 .
;
run;

%macro chk_chron(array=,flagvar=);
  &flagvar=0;
  do _e=1 to 4 while (&flagvar=0);
    if not missing(&array{_e}) then do _f=_e+1 to 5 while (&flagvar=0);
      if not missing(&array{_f}) and &array{_f}<&array{_e} then &flagvar=1;
    end;
  end;
%mend;


data want (drop=_:);
  set have;
  array dat {5} e1date e2date e3date e4date e5date;
  array tim {5} e1time e2time e3time e4time e5time;
  array dtm {5};
  format dtm: datetime.0;
  do _e=1 to 5;
    if n(dat{_e},tim{_e})=2 then dtm{_e}=dhms(dat{_e},floor(tim{_e}/100),mod(tim{_e},100),0);
  end;

  %chk_chron(array=dat,flagvar=flag_dates);
  %chk_chron(array=dtm,flagvar=flag_datetimes);
run;

I added to new record IDs: 2.1 (date times out of order) and 4.1 (dates out of order), since  none of your examples reported and out-of-order record.  Also I set the flag var to zero for "in order" and flag var=1 for out of order.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
SM1
Obsidian | Level 7 SM1
Obsidian | Level 7

@mkeintz,

 

Your approach did catch a few cases that were not flagged by @gamotte's syntax and that I missed when I was reviewing the earlier results.

 

The event time variables are limited to HHMM, so thank you for suggesting syntax that will better transform the discrete date and time variables into an aggregate datetime var.

 

I really appreciate your assistance. I will be able to reuse this syntax as I continue to work with our historic database.

 

Thanks again!

SM1
Obsidian | Level 7 SM1
Obsidian | Level 7

@gamotte,

 

Thank you! The syntax worked perfectly and it's an elegant solution.

 

I'm assuming there's really no other way to deal with missing date/time values. This will be a really helpful example going forward.

 

Thanks again.

 

mkeintz
PROC Star

Do you have a good reason to use:

   datetim=dhms(date,0,0,time)

instead of

  datetime=dhms(date,floor(time/100),mod(time,100),0);

?

 

By formatting the datetime variables as datetime, you are presumably endorsing the meaning of the value.  But your TIME variables appear to be 4-digit numbers (hhmm), so the values will be erroneous.   Unless your time variables really are number of seconds into the day.   I repeat, these value do preserve relative order, but they don't honor actual time.

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
SM1
Obsidian | Level 7 SM1
Obsidian | Level 7

@mkeintz,

 

I couldn't figure out how I had missed so many chronology errors when I reviewed the results generated by @gamotte's syntax, so I re-ran everything. I'm not sure why, but your suggested syntax is generating an incorrect datetime variable (DTME1) while the syntax of datetim=dhms(date,0,0,time) is generating a good var (DateTimeE1).

 

It would seem best to use a more precise approach to creating a datetime variable, but I can't figure out why DTME1 looks so different from DateTimeE1.

                                        RRForm   RRID     DateE1        TimeE1        DTME1                     DateTimeE1

2018103NOV20189:0016NOV18:12:00:0003NOV18:09:00:00
2018218NOV201810:0003DEC18:00:00:0018NOV18:10:00:00
2018329NOV201813:0018DEC18:12:00:0029NOV18:13:00:00
2018408DEC201819:0005JAN19:12:00:0008DEC18:19:00:00
2018609NOV201821:3011DEC18:06:00:0009NOV18:21:30:00
2018922SEP201817:2518OCT18:03:00:0022SEP18:17:25:00
20181022SEP201817:2518OCT18:03:00:0022SEP18:17:25:00
20181122SEP201817:2518OCT18:03:00:0022SEP18:17:25:00
20181222SEP201817:2518OCT18:03:00:0022SEP18:17:25:00
20181322SEP201817:2518OCT18:03:00:0022SEP18:17:25:00
20181422SEP20189:5006OCT18:18:00:0022SEP18:09:50:00
20181506NOV201818:0003DEC18:00:00:0006NOV18:18:00:00
20181606NOV201818:0003DEC18:00:00:0006NOV18:18:00:00
20181716SEP201815:0008OCT18:12:00:0016SEP18:15:00:00
20181817SEP201817:0012OCT18:12:00:0017SEP18:17:00:00
20181925SEP201812:0013OCT18:00:00:0025SEP18:12:00:00
20182027SEP201813:3617OCT18:10:00:0027SEP18:13:36:00
20182228SEP201818:0325OCT18:02:20:0028SEP18:18:03:00
20182328SEP201818:0325OCT18:02:20:0028SEP18:18:03:00

 

mkeintz
PROC Star

If the use of dhms(date,0,0,time) is generating correct values, then you must have created the time variables using an actual time informat, but I never saw evidence of that  (I saw values like 1535 1723,  not 15:35, 17:23).  If they already are true time values, then the underlying numeric value is the number of seconds after midnight, and your usage of the DHMS function is the right way to go.

 

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Dvdscot
Obsidian | Level 7

I tried that as well and it didn't work. In my case I have only date values so should be simpler:

 

11Sep2016	14Nov2016	03Apr2017	28Jul2017	28Jan2018	23Jul2018	19Jan2019
.	.	.	.	.	.	.
25Jun2018	15Okt2018	06Dez2018	05Apr2019	18Okt2019	23Apr2020	.
.	22Mrz2017	12Jul2017	.	.	05Jun2018	14Dez2018
16Mrz2019	04Jun2019	02Okt2019	.	27Apr2020	.	.
26Sep2017	19Dez2017	02Apr2018	.	.	22Jul2019	09Dez2019
01Dez2019	18Mrz2020	14Jun2020	09Feb2021	30Sep2020	22Jun2020	.
27Okt2018	07Jan2019	05Apr2019	06Jul2019	19Nov2019	27Apr2020	25Nov2020
.	31Jan2016	.	.	25Okt2016	16Jul2017	16Jan2018
26Dez2015	.	11Mrz2016	.	.	.	.
.	03Apr2017	21Aug2017	.	26Apr2018	04Nov2018	20Feb2019
20Feb2019	12Mrz2019	15Aug2019	17Dez2019	06Jun2020	.	24Jan2021
.	.	.	.	.	.	.
21Nov2017	.	.	.	.	.	09Aug2020
24Sep2019	06Jan2020	06Jan2020	08Jul2020	.	.	.
01Mai2017	04Jul2017	11Dez2017	25Feb2018	10Apr2018	08Nov2018	.
22Feb2017	22Jun2017	18Sep2017	11Dez2017	17Jun2018	.	15Jan2019
04Nov2018	.	.	.	.	.	.
.	.	.	16Jun2019	.	14Jun2020	.
.	02Mrz2020	.	.	.	.	.
.	.	.	.	.	.	.
.	.	.	.	.	.	.
19Apr2020	22Jun2020	.	.	25Feb2021	.	.

Start at the first date value and compare with the next one, ignore missings. If everything is in correct order a new value says ok, if there is one or more mistakes the new value should say error or a number.

 

In this case for example:

01.01.2020 .  .  .  03.01.2020 .  .   02.01.2020

FreelanceReinh
Jade | Level 19

[This new requirement is currently being discussed in another thread.]

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 10 replies
  • 2159 views
  • 4 likes
  • 5 in conversation