Hello,
I'm working with data that includes up to 5 event date & time variables per row. Missing data is an issue. The 5 events should have occurred in chronological order, (e.g., Event1 before Event2 before Event3 . . .). For each event, there may be a SAS date variable only, a SAS time variable only, both variables or missing data.
I need to flag observations if the events are not in chronological order. For observations with complete information for each event, this is not a problem. I can't figure out how to work with rows that have missing data. I would like to generate 2 variables: (1) flag if there is good chronological order for the available date & time variables and (b) flag for good chronological order for the available SAS dates only.
Any suggestions will be most appreciated.
Sample data:
ID E1Date E1Time E2Date E2Time E3Date E3Time E4Date E4Time E5Date E5Time
1 05JAN15 1715 05JAN15 2130 05JAN15 2355 06JAN15 0100 06JAN15 0300
2 15MAY11 1300 15MAY11 1523 15MAY11 1633
3 09FEB13 09FEB13 09FEB13 0800 09FEB13 1015 09FEB13 1145
4 18JUN17 18JUN17 18JUN17
data want;
data have;
length GoodDateTimeChron $8.;
GoodDateTimeChron = 'NO';
DateTimeE1 = dhms(DateE1, 0, 0, TimeE1); /*create SAS datetime vars*/
DateTimeE2 = dhms(DateE2, 0, 0, TimeE2);
DateTimeE3 = dhms(Date33, 0, 0, TimeE3);
DateTimeE4 = dhms(DateE4, 0, 0, TimeD4);
DateTimeE5 = dhms(DateE5, 0, 0, TimeE5);
format DateTimeE1 DateTimeE2 DateTimeE3 DateTimeE4 DateTimeE5 datetime20.;
/*this correctly flags row 1 by chronology, but none of the other rows due to missing data*/
if not missing(DateTimeE1) >=not missing(DateTimeE2) >=not missing(DateTimeE3) >=not missing(DateTimeE4) >=not missing(DateTimeE5) then GoodDateTimeChron = 'YES';
/*this doesn't directly flag for chronology, but it counts the number of distinct date values, including missing as a distinct value */
array event {5} DateE1 DateE2 DateE3 DateE4 DateE5 ;
array new {5} $20 _temporary_;
do _n_=1 to 5;
new{_n_} = event{_n_};
end;
call sortc(of new{*});
count = (new{1} > ' ');
do _n_=2 to 5;
if new{_n_} ne new{_n_-1} then count + 1;
end;
run;
Hello,
The following program creates new arrays where all non missing values are consecutive so that there are only
missing values at the end of arrays => The chronology only has to be checked until a missing value is met.
data have;
informat E1Date E2Date E3Date E4Date E5Date date9.;
format E1Date E2Date E3Date E4Date E5Date date9.;
input ID E1Date E1Time E2Date E2Time E3Date E3Time E4Date E4Time E5Date E5Time;
cards;
1 05JAN15 1715 05JAN15 2130 05JAN15 2355 06JAN15 0100 06JAN15 0300
2 15MAY11 1300 . . . . 15MAY11 1523 15MAY11 1633
3 09FEB13 . 09FEB13 . 09FEB13 0800 09FEB13 1015 09FEB13 1145
4 18JUN17 . . . . . 18JUN17 . 18JUN17 .
;
run;
%macro supp_miss(array_in, array_out);
i=0;
call missing(of &array_out.(*));
do over &array_in.;
if not missing(&array_in.) then do;
i+1;
&array_out.(i)=&array_in;
end;
end;
%mend;
%macro check_chronology(array, flag);
&flag.=1;
do i=1 to dim(&array.)-1 until(exit);
if missing(&array.(i+1)) then exit=1;
else do;
&flag.=&flag. and (&array.(i+1) ge &array.(i));
end;
end;
%mend;
data want;
set have;
format dt1-dt5 datetime.;
array dates E1Date E2Date E3Date E4Date E5Date;
array times E1Time E2Time E3Time E4Time E5Time;
array datetimes dt1-dt5;
array nmissdates(5) _TEMPORARY_;
array nmissdatetimes(5) _TEMPORARY_;
i=0;
do over dates;
if not missing(dates) and not missing(times) then do;
datetimes=dhms(dates, 0, 0, times);
end;
end;
%supp_miss(dates, nmissdates);
%supp_miss(datetimes, nmissdatetimes);
%check_chronology(nmissdates,flag_dates);
%check_chronology(nmissdatetimes,flag_datetimes);
run;
If a missing time value occurs between two valid time values, I presume that has no effect on your designation of event order, correct?
Could you set up your sample data in a DATA HAVE step? That would make it a lot easier for us to help you.
Hello,
The following program creates new arrays where all non missing values are consecutive so that there are only
missing values at the end of arrays => The chronology only has to be checked until a missing value is met.
data have;
informat E1Date E2Date E3Date E4Date E5Date date9.;
format E1Date E2Date E3Date E4Date E5Date date9.;
input ID E1Date E1Time E2Date E2Time E3Date E3Time E4Date E4Time E5Date E5Time;
cards;
1 05JAN15 1715 05JAN15 2130 05JAN15 2355 06JAN15 0100 06JAN15 0300
2 15MAY11 1300 . . . . 15MAY11 1523 15MAY11 1633
3 09FEB13 . 09FEB13 . 09FEB13 0800 09FEB13 1015 09FEB13 1145
4 18JUN17 . . . . . 18JUN17 . 18JUN17 .
;
run;
%macro supp_miss(array_in, array_out);
i=0;
call missing(of &array_out.(*));
do over &array_in.;
if not missing(&array_in.) then do;
i+1;
&array_out.(i)=&array_in;
end;
end;
%mend;
%macro check_chronology(array, flag);
&flag.=1;
do i=1 to dim(&array.)-1 until(exit);
if missing(&array.(i+1)) then exit=1;
else do;
&flag.=&flag. and (&array.(i+1) ge &array.(i));
end;
end;
%mend;
data want;
set have;
format dt1-dt5 datetime.;
array dates E1Date E2Date E3Date E4Date E5Date;
array times E1Time E2Time E3Time E4Time E5Time;
array datetimes dt1-dt5;
array nmissdates(5) _TEMPORARY_;
array nmissdatetimes(5) _TEMPORARY_;
i=0;
do over dates;
if not missing(dates) and not missing(times) then do;
datetimes=dhms(dates, 0, 0, times);
end;
end;
%supp_miss(dates, nmissdates);
%supp_miss(datetimes, nmissdatetimes);
%check_chronology(nmissdates,flag_dates);
%check_chronology(nmissdatetimes,flag_datetimes);
run;
You don't have to make the non-missing event dates and times contiguous. You can just conditionally compare an earlier event to a later event when both are non-missing. The advantage is that you would then preserve the original missing and non-missing events - possibly useful for later analysis:
data have;
informat E1Date E2Date E3Date E4Date E5Date date9.;
format E1Date E2Date E3Date E4Date E5Date date9.;
input ID E1Date E1Time E2Date E2Time E3Date E3Time E4Date E4Time E5Date E5Time;
cards;
1 05JAN15 1715 05JAN15 2130 05JAN15 2355 06JAN15 0100 06JAN15 0300
2 15MAY11 1300 . . . . 15MAY11 1523 15MAY11 1633
2.1 15MAY11 1523 . . . . 15MAY11 1633 15MAY11 1300
3 09FEB13 . 09FEB13 . 09FEB13 0800 09FEB13 1015 09FEB13 1145
4 18JUN17 . . . . . 18JUN17 . 18JUN17 .
4.1 19JUN17 . . . . . 18JUN17 . 18JUN17 .
;
run;
%macro chk_chron(array=,flagvar=);
&flagvar=0;
do _e=1 to 4 while (&flagvar=0);
if not missing(&array{_e}) then do _f=_e+1 to 5 while (&flagvar=0);
if not missing(&array{_f}) and &array{_f}<&array{_e} then &flagvar=1;
end;
end;
%mend;
data want (drop=_:);
set have;
array dat {5} e1date e2date e3date e4date e5date;
array tim {5} e1time e2time e3time e4time e5time;
array dtm {5};
format dtm: datetime.0;
do _e=1 to 5;
if n(dat{_e},tim{_e})=2 then dtm{_e}=dhms(dat{_e},floor(tim{_e}/100),mod(tim{_e},100),0);
end;
%chk_chron(array=dat,flagvar=flag_dates);
%chk_chron(array=dtm,flagvar=flag_datetimes);
run;
I added to new record IDs: 2.1 (date times out of order) and 4.1 (dates out of order), since none of your examples reported and out-of-order record. Also I set the flag var to zero for "in order" and flag var=1 for out of order.
Your approach did catch a few cases that were not flagged by @gamotte's syntax and that I missed when I was reviewing the earlier results.
The event time variables are limited to HHMM, so thank you for suggesting syntax that will better transform the discrete date and time variables into an aggregate datetime var.
I really appreciate your assistance. I will be able to reuse this syntax as I continue to work with our historic database.
Thanks again!
Thank you! The syntax worked perfectly and it's an elegant solution.
I'm assuming there's really no other way to deal with missing date/time values. This will be a really helpful example going forward.
Thanks again.
Do you have a good reason to use:
datetim=dhms(date,0,0,time)
instead of
datetime=dhms(date,floor(time/100),mod(time,100),0);
?
By formatting the datetime variables as datetime, you are presumably endorsing the meaning of the value. But your TIME variables appear to be 4-digit numbers (hhmm), so the values will be erroneous. Unless your time variables really are number of seconds into the day. I repeat, these value do preserve relative order, but they don't honor actual time.
I couldn't figure out how I had missed so many chronology errors when I reviewed the results generated by @gamotte's syntax, so I re-ran everything. I'm not sure why, but your suggested syntax is generating an incorrect datetime variable (DTME1) while the syntax of datetim=dhms(date,0,0,time) is generating a good var (DateTimeE1).
It would seem best to use a more precise approach to creating a datetime variable, but I can't figure out why DTME1 looks so different from DateTimeE1.
RRForm RRID DateE1 TimeE1 DTME1 DateTimeE1
2018 | 1 | 03NOV2018 | 9:00 | 16NOV18:12:00:00 | 03NOV18:09:00:00 |
2018 | 2 | 18NOV2018 | 10:00 | 03DEC18:00:00:00 | 18NOV18:10:00:00 |
2018 | 3 | 29NOV2018 | 13:00 | 18DEC18:12:00:00 | 29NOV18:13:00:00 |
2018 | 4 | 08DEC2018 | 19:00 | 05JAN19:12:00:00 | 08DEC18:19:00:00 |
2018 | 6 | 09NOV2018 | 21:30 | 11DEC18:06:00:00 | 09NOV18:21:30:00 |
2018 | 9 | 22SEP2018 | 17:25 | 18OCT18:03:00:00 | 22SEP18:17:25:00 |
2018 | 10 | 22SEP2018 | 17:25 | 18OCT18:03:00:00 | 22SEP18:17:25:00 |
2018 | 11 | 22SEP2018 | 17:25 | 18OCT18:03:00:00 | 22SEP18:17:25:00 |
2018 | 12 | 22SEP2018 | 17:25 | 18OCT18:03:00:00 | 22SEP18:17:25:00 |
2018 | 13 | 22SEP2018 | 17:25 | 18OCT18:03:00:00 | 22SEP18:17:25:00 |
2018 | 14 | 22SEP2018 | 9:50 | 06OCT18:18:00:00 | 22SEP18:09:50:00 |
2018 | 15 | 06NOV2018 | 18:00 | 03DEC18:00:00:00 | 06NOV18:18:00:00 |
2018 | 16 | 06NOV2018 | 18:00 | 03DEC18:00:00:00 | 06NOV18:18:00:00 |
2018 | 17 | 16SEP2018 | 15:00 | 08OCT18:12:00:00 | 16SEP18:15:00:00 |
2018 | 18 | 17SEP2018 | 17:00 | 12OCT18:12:00:00 | 17SEP18:17:00:00 |
2018 | 19 | 25SEP2018 | 12:00 | 13OCT18:00:00:00 | 25SEP18:12:00:00 |
2018 | 20 | 27SEP2018 | 13:36 | 17OCT18:10:00:00 | 27SEP18:13:36:00 |
2018 | 22 | 28SEP2018 | 18:03 | 25OCT18:02:20:00 | 28SEP18:18:03:00 |
2018 | 23 | 28SEP2018 | 18:03 | 25OCT18:02:20:00 | 28SEP18:18:03:00 |
If the use of dhms(date,0,0,time) is generating correct values, then you must have created the time variables using an actual time informat, but I never saw evidence of that (I saw values like 1535 1723, not 15:35, 17:23). If they already are true time values, then the underlying numeric value is the number of seconds after midnight, and your usage of the DHMS function is the right way to go.
I tried that as well and it didn't work. In my case I have only date values so should be simpler:
11Sep2016 14Nov2016 03Apr2017 28Jul2017 28Jan2018 23Jul2018 19Jan2019
. . . . . . .
25Jun2018 15Okt2018 06Dez2018 05Apr2019 18Okt2019 23Apr2020 .
. 22Mrz2017 12Jul2017 . . 05Jun2018 14Dez2018
16Mrz2019 04Jun2019 02Okt2019 . 27Apr2020 . .
26Sep2017 19Dez2017 02Apr2018 . . 22Jul2019 09Dez2019
01Dez2019 18Mrz2020 14Jun2020 09Feb2021 30Sep2020 22Jun2020 .
27Okt2018 07Jan2019 05Apr2019 06Jul2019 19Nov2019 27Apr2020 25Nov2020
. 31Jan2016 . . 25Okt2016 16Jul2017 16Jan2018
26Dez2015 . 11Mrz2016 . . . .
. 03Apr2017 21Aug2017 . 26Apr2018 04Nov2018 20Feb2019
20Feb2019 12Mrz2019 15Aug2019 17Dez2019 06Jun2020 . 24Jan2021
. . . . . . .
21Nov2017 . . . . . 09Aug2020
24Sep2019 06Jan2020 06Jan2020 08Jul2020 . . .
01Mai2017 04Jul2017 11Dez2017 25Feb2018 10Apr2018 08Nov2018 .
22Feb2017 22Jun2017 18Sep2017 11Dez2017 17Jun2018 . 15Jan2019
04Nov2018 . . . . . .
. . . 16Jun2019 . 14Jun2020 .
. 02Mrz2020 . . . . .
. . . . . . .
. . . . . . .
19Apr2020 22Jun2020 . . 25Feb2021 . .
Start at the first date value and compare with the next one, ignore missings. If everything is in correct order a new value says ok, if there is one or more mistakes the new value should say error or a number.
In this case for example:
01.01.2020 . . . 03.01.2020 . . 02.01.2020
[This new requirement is currently being discussed in another thread.]
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.