I note that sorting and using first. and last. variables to de-dupe a dataset behaves differently between a date and a datetime variable. If fact, it seems to be counter intuitive:
data one ;
informat date datetime20. ;
format date datetime20. ;
input id $char3. var1 date ;
put _all_ ;
cards ;
one 1 30jun1948:01:00:00
one 1 30jun1948:02:00:00
data two ;
set one ;
by id var1 date ;
if last.date ;
run ;
data oneplus ;
format dateday date9. ;
set one ;
dateday = datepart(date) ;
put _all_ ;
run ;
data twoplus ;
set oneplus ;
by id var1 dateday ;
if last.dateday ;
run ;
Note from the log below that the datetime variable does not de-dupe on the date in datetime, which has more granularity, but does on dateday, which represent the day. Can anyone explain this to me. Thanks.
log:
302 data one ;
303 informat date datetime20. ;
304 format date datetime20. ;
305 input id $char3. var1 date ;
306 put _all_ ;
307 cards ;
date=30JUN1948:01:00:00 id=one var1=1 _ERROR_=0 _N_=1
date=30JUN1948:02:00:00 id=one var1=1 _ERROR_=0 _N_=2
NOTE: The data set WORK.ONE has 2 observations and 3 variables.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds
310 data two ;
311 set one ;
312 by id var1 date ;
313 if last.date ;
314 run ;
NOTE: There were 2 observations read from the data set WORK.ONE.
NOTE: The data set WORK.TWO has 2 observations and 3 variables.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds
315 data oneplus ;
316 format dateday date9. ;
317 set one ;
318 dateday = datepart(date) ;
319 put _all_ ;
320 run ;
dateday=30JUN1948 date=30JUN1948:01:00:00 id=one var1=1 _ERROR_=0 _N_=1
dateday=30JUN1948 date=30JUN1948:02:00:00 id=one var1=1 _ERROR_=0 _N_=2
NOTE: There were 2 observations read from the data set WORK.ONE.
NOTE: The data set WORK.ONEPLUS has 2 observations and 4 variables.
NOTE: DATA statement used (Total process time):
real time 0.02 seconds
cpu time 0.01 seconds
321 data twoplus ;
322 set oneplus ;
323 by id var1 dateday ;
324 if last.dateday ;
325 run ;
NOTE: There were 2 observations read from the data set WORK.ONEPLUS.
NOTE: The data set WORK.TWOPLUS has 1 observations and 4 variables.
NOTE: DATA statement used (Total process time):
real time 0.02 seconds
cpu time 0.01 seconds
Hi.
Don't see nothing wrong about that.
Not the same datetime values, that's why you get 2 observations in data set one.
one 1 30jun1948:01:00:00
one 1 30jun1948:02:00:00
Hope it helps.
Daniel Santos @ www.cgd.pt
Sorry, your question is quite unclear. Are you questioning why doing last.datetime gives 2 records, and last.date gives 1? If so that is simple, the last.datetime is including the time part which is different in each row, hence both rows come out. If you remove the timepart then the two dates are the same hence only 1 comes out.
Hi.
Don't see nothing wrong about that.
Not the same datetime values, that's why you get 2 observations in data set one.
one 1 30jun1948:01:00:00
one 1 30jun1948:02:00:00
Hope it helps.
Daniel Santos @ www.cgd.pt
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.