DATA Step, Macro, Functions and more

de-duping SAS dataset

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 16
Accepted Solution

de-duping SAS dataset

I note that sorting and using first. and last. variables to de-dupe a dataset behaves differently between a date and a datetime variable.  If fact, it seems to be counter intuitive:

 

data one ;

informat date datetime20. ;

format date datetime20. ;

input id $char3. var1 date ;

put _all_ ;

cards ;

one 1 30jun1948:01:00:00

one 1 30jun1948:02:00:00

data two ;

set one ;

by id var1 date ;

if last.date ;

run ;

data oneplus ;

format dateday date9. ;

set one ;

dateday = datepart(date) ;

put _all_ ;

run ;

data twoplus ;

set oneplus ;

by id var1 dateday ;

if last.dateday ;

run ;

 

Note from the log below that the datetime variable does not de-dupe on the date in datetime, which has more granularity, but does on dateday, which represent the day.  Can anyone explain this to me.  Thanks. 

 

log:

302 data one ;

303 informat date datetime20. ;

304 format date datetime20. ;

305 input id $char3. var1 date ;

306 put _all_ ;

307 cards ;

date=30JUN1948:01:00:00 id=one var1=1 _ERROR_=0 _N_=1

date=30JUN1948:02:00:00 id=one var1=1 _ERROR_=0 _N_=2

NOTE: The data set WORK.ONE has 2 observations and 3 variables.

NOTE: DATA statement used (Total process time):

real time 0.01 seconds

cpu time 0.01 seconds

 

310 data two ;

311 set one ;

312 by id var1 date ;

313 if last.date ;

314 run ;

NOTE: There were 2 observations read from the data set WORK.ONE.

NOTE: The data set WORK.TWO has 2 observations and 3 variables.

NOTE: DATA statement used (Total process time):

real time 0.01 seconds

cpu time 0.01 seconds

 

315 data oneplus ;

316 format dateday date9. ;

317 set one ;

318 dateday = datepart(date) ;

319 put _all_ ;

320 run ;

dateday=30JUN1948 date=30JUN1948:01:00:00 id=one var1=1 _ERROR_=0 _N_=1

dateday=30JUN1948 date=30JUN1948:02:00:00 id=one var1=1 _ERROR_=0 _N_=2

NOTE: There were 2 observations read from the data set WORK.ONE.

NOTE: The data set WORK.ONEPLUS has 2 observations and 4 variables.

NOTE: DATA statement used (Total process time):

real time 0.02 seconds

cpu time 0.01 seconds

 

321 data twoplus ;

322 set oneplus ;

323 by id var1 dateday ;

324 if last.dateday ;

325 run ;

NOTE: There were 2 observations read from the data set WORK.ONEPLUS.

NOTE: The data set WORK.TWOPLUS has 1 observations and 4 variables.

NOTE: DATA statement used (Total process time):

real time 0.02 seconds

cpu time 0.01 seconds

 


Accepted Solutions
Solution
‎01-16-2017 10:30 AM
Super Contributor
Posts: 474

Re: de-duping SAS dataset

Posted in reply to grezek_tcfbank_com

Hi. 

 

Don't see nothing wrong about that.

 

Not the same datetime values, that's why you get 2 observations in data set one.

 

one 1 30jun1948:01:00:00

one 1 30jun1948:02:00:00

 

Hope it helps.

 

Daniel Santos @ www.cgd.pt

View solution in original post


All Replies
Super User
Super User
Posts: 7,989

Re: de-duping SAS dataset

Posted in reply to grezek_tcfbank_com

Sorry, your question is quite unclear.  Are you questioning why doing last.datetime gives 2 records, and last.date gives 1?  If so that is simple, the last.datetime is including the time part which is different in each row, hence both rows come out.  If you remove the timepart then the two dates are the same hence only 1 comes out.

Solution
‎01-16-2017 10:30 AM
Super Contributor
Posts: 474

Re: de-duping SAS dataset

Posted in reply to grezek_tcfbank_com

Hi. 

 

Don't see nothing wrong about that.

 

Not the same datetime values, that's why you get 2 observations in data set one.

 

one 1 30jun1948:01:00:00

one 1 30jun1948:02:00:00

 

Hope it helps.

 

Daniel Santos @ www.cgd.pt

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 2 replies
  • 187 views
  • 0 likes
  • 3 in conversation