BookmarkSubscribeRSS Feed
sassy_seb
Quartz | Level 8

Hello everyone, my supervisor just left me this problem for me to solve. There is a specific date variable we want to observe, however a lot of the data is showing up as asterisks. I understand that this is because the number is too long and inaccurate to SAS numerical dates.

 

My question is, how can I go about fixing this? I believe this data is already saved from an older ACCESS dataset. 

 

Here is the results viewer window:

sassy_seb_0-1727901524314.png

 

As well as how that looks in the log when I simply put the number:

 

23601
23607
23601
23614
1991520000
1806883200
1804982400
1806364800
1807488000
1808092800
1808956800
1810166400

 

 

10 REPLIES 10
PaigeMiller
Diamond | Level 26
23601
23607
23601
23614

These are actual SAS dates, which when the format yymmdd10. is applied, look like 2024-08-13

 

1991520000
1806883200
1804982400
1806364800
1807488000
1808092800
1808956800
1810166400

 

These are actual SAS date/time values, which when displayed in format yymmdd10. is too large for 10 characters wide.

 

Most likely, you want to use the datepart function to extract the date from the date/time values.

 

So, perhaps this solves the problem:

 

data want;
    set have;
    if date>100000 then date=datepart(date);
    format date yymmdd10.;
run;

 

 

--
Paige Miller
sassy_seb
Quartz | Level 8

Is there a way for me to isolate the values that need the datepart? Because as you saw some are dates and others are datetime. I tried just doing datepart() on all of the values and the ones that were dates ended up converting into 0's and showing up as 01-01-1960 of course.

ballardw
Super User

@sassy_seb wrote:

Is there a way for me to isolate the values that need the datepart? Because as you saw some are dates and others are datetime. I tried just doing datepart() on all of the values and the ones that were dates ended up converting into 0's and showing up as 01-01-1960 of course.


@PaigeMiller already showed you. Pick a value that is larger than you expect for a valid date. The magic number he picked, 100000 corresponds to 16 Oct 2233 (pick a similar value that you don't expect to actually appear in your data). Then compare that numeric value to that value if it is too large use the datepart function. You could express the magic date as a date literal '16OCT2233'd or similar to help someone reading the code later (this is the sort of thing that should be carefully commented in the code).

 

Although that does correspond to 02JAN1960:03:46:40 as a datetime value. So careful consideration of the value may be in order.

ballardw
Super User

For what little it may be worth, currently the latest date that SAS will work with is 31 Dec 20,000, yes not quite 18 thousand years in the future. In terms of days since 1 Jan 1960 as SAS does things, that is a value of 6589335. So any number larger than that will have issues with any of the date related stuff.

None of the date formats will display for years past 9999 as well, showing just asterisks. So that is a clue about something out of range. So suspect values larger than that magic 6589335 as being datetime values is the way to go.

 

Note that with some of the date formats if the date is between the year 9999 and 20000 you may see a day and month (or other bits) depending on width. Many of the date formats don't have enough digits to display the year as 5 digits.

 

This looks like someone appending data without being sure what they were working with.

Tom
Super User Tom
Super User

How did you create this dataset?

You seem to have mix of DATE (number of days) and DATETIME (number of seconds) values.

1    data test;
2      input date ;
3      put (3*date) (comma13. +1 date9. +1 datetime19.);
4    cards;

       23,601 13AUG2024  01JAN1960:06:33:21
       23,607 19AUG2024  01JAN1960:06:33:27
       23,601 13AUG2024  01JAN1960:06:33:21
       23,614 26AUG2024  01JAN1960:06:33:34
1,991,520,000 *********  09FEB2023:00:00:00
1,806,883,200 *********  04APR2017:00:00:00
1,804,982,400 *********  13MAR2017:00:00:00
1,806,364,800 *********  29MAR2017:00:00:00
1,807,488,000 *********  11APR2017:00:00:00
1,808,092,800 *********  18APR2017:00:00:00
1,808,956,800 *********  28APR2017:00:00:00
1,810,166,400 *********  12MAY2017:00:00:00

What range of date values did you expect? 

Are they all supposed to represent past events? Or could some supposed to represent dates into the future?  If so how far into the future?  You might be able to tell them apart by testing if they are larger than your expected maximum date.

38   data test;
39     input raw ;
40     format raw comma13.  date date9. datetime datetime19.;
41     if raw > mdy(1,1,2500) then do;
42       datetime=raw;
43       date=datepart(datetime);
44     end;
45     else do;
46       date=raw;
47       datetime=dhms(date,0,0,0);
48     end;
49     put (_all_) (=);
50   cards;

raw=23,601 date=13AUG2024 datetime=13AUG2024:00:00:00
raw=23,607 date=19AUG2024 datetime=19AUG2024:00:00:00
raw=23,601 date=13AUG2024 datetime=13AUG2024:00:00:00
raw=23,614 date=26AUG2024 datetime=26AUG2024:00:00:00
raw=1,991,520,000 date=09FEB2023 datetime=09FEB2023:00:00:00
raw=1,806,883,200 date=04APR2017 datetime=04APR2017:00:00:00
raw=1,804,982,400 date=13MAR2017 datetime=13MAR2017:00:00:00
raw=1,806,364,800 date=29MAR2017 datetime=29MAR2017:00:00:00
raw=1,807,488,000 date=11APR2017 datetime=11APR2017:00:00:00
raw=1,808,092,800 date=18APR2017 datetime=18APR2017:00:00:00
raw=1,808,956,800 date=28APR2017 datetime=28APR2017:00:00:00
raw=1,810,166,400 date=12MAY2017 datetime=12MAY2017:00:00:00
NOTE: The data set WORK.TEST has 12 observations and 3 variables.
NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds

Since your datetime values seem to all represent midnight perhaps you can just check if the remainder when divided by the number of seconds in a day is zero?

64   data test;
65     input raw ;
66     format raw comma13.  date date9. datetime datetime19.;
67     if 0=mod(raw,'24:00:00't) then do;
68       datetime=raw;
69       date=datepart(datetime);
70     end;
71     else do;
72       date=raw;
73       datetime=dhms(date,0,0,0);
74     end;
75     put (_all_) (=);
76   cards;

raw=23,601 date=13AUG2024 datetime=13AUG2024:00:00:00
raw=23,607 date=19AUG2024 datetime=19AUG2024:00:00:00
raw=23,601 date=13AUG2024 datetime=13AUG2024:00:00:00
raw=23,614 date=26AUG2024 datetime=26AUG2024:00:00:00
raw=1,991,520,000 date=09FEB2023 datetime=09FEB2023:00:00:00
raw=1,806,883,200 date=04APR2017 datetime=04APR2017:00:00:00
raw=1,804,982,400 date=13MAR2017 datetime=13MAR2017:00:00:00
raw=1,806,364,800 date=29MAR2017 datetime=29MAR2017:00:00:00
raw=1,807,488,000 date=11APR2017 datetime=11APR2017:00:00:00
raw=1,808,092,800 date=18APR2017 datetime=18APR2017:00:00:00
raw=1,808,956,800 date=28APR2017 datetime=28APR2017:00:00:00
raw=1,810,166,400 date=12MAY2017 datetime=12MAY2017:00:00:00
NOTE: The data set WORK.TEST has 12 observations and 3 variables.
NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds

How far back in time do they go?  Could they go back as far as 1960?  If so it might be hard to distinguish a date value from a datetime value since 0 represents 1-JAN-1960 in both numbering schemes. 

 

sassy_seb
Quartz | Level 8

It's a mix of datasets from Access, Excel, and stuff they had already saved in SAS, as well as from REDCap. I was thinking of just separating based on being bigger than a certain number and applying the changes there.

Tom
Super User Tom
Super User

@sassy_seb wrote:

It's a mix of datasets from Access, Excel, and stuff they had already saved in SAS, as well as from REDCap. I was thinking of just separating based on being bigger than a certain number and applying the changes there.


For each source figure out which type of values (date or datetime) it was using and then apply the right conversion based on where the value came from. 

data want;
  set excel access indsname=dsname; 
  if dsname in ('ACCESS') then date=datepart(date);
run;

Or perhaps rename the variables from different sources to reflect the type of value they contain.

data want;
  set excel access(rename=(date=datetime));
  format date date9. datetime datetime19.;
run;
ballardw
Super User

@sassy_seb wrote:

It's a mix of datasets from Access, Excel, and stuff they had already saved in SAS, as well as from REDCap. I was thinking of just separating based on being bigger than a certain number and applying the changes there.


With Excel in the mix that seriously complicates any "one size" fits all approach. Excel dates start at 1Jan1900 so differ from SAS dates by 21914 (depending on particular version of Excel...). So values like 42000 would be 29Dec2014 and you would not want to apply datepart to them but would not want to leave them alone either.

 

I really would start by going back one step and handling each file/dataset/source separately as each needs a different "fix".

I would also look out for "fake" dates, simple numeric values like 20200516 which could be masquerading as 16 May 2020.

 

Heaven help you if one of those other files had stored Julian dates like 24112 which could be either 21 Apr 1924 or 21 Apr 2024 but is numerically close enough to other SAS dates (this would be 6 Jan 2026) that range checks are not going to be helpful.

Patrick
Opal | Level 21

@sassy_seb wrote:

It's a mix of datasets from Access, Excel, and stuff they had already saved in SAS, as well as from REDCap. I was thinking of just separating based on being bigger than a certain number and applying the changes there.


Under the assumption that the source dates converted into the correct SAS date and datetime values and you don't have actual source datetimes close to 01Jan1960 this should work. However as others already stated if your current SAS date/datetime values aren't correct then you would need further logic to amend them based on the source system (Excel, Access, ....).

 

Below sample code with two options how to display/print the values without changing the internally stored value and the 3rd case how to amend the values. 

I don't recommend the first two options because any sorting will use the internally stored values and you also can't use functions like intck() and intnx() with a mix of date and datetime values.

proc format;
  value dt_dttm_A
  '01Jan1900'd - '01Jan2500'd = [date9.]
  other                       = [datetime20.]
  ;
  value dt_dttm_B
  '01Jan1900'd - '01Jan2500'd = [date9.]
  other                       = [dtdate9.]
  ;
run;

data demo;
  input src_date_vals_raw;

  date_vals_A=src_date_vals_raw;
  format date_vals_A dt_dttm_A.;

  date_vals_B=src_date_vals_raw;
  format date_vals_B dt_dttm_B.;

  if '01Jan1900'd <= src_date_vals_raw >='01Jan2500'd then 
    amended_date_vals=datepart(src_date_vals_raw);
  else
    amended_date_vals=src_date_vals_raw;
  format amended_date_vals date9.;
  datalines;
23601
23607
23601
23614
1991520000
1806883200
1804982400
1806364800
1807488000
1808092800
1808956800
1810166400
;
run;

proc print data=demo;
run;

 

Patrick_0-1727911794822.png

 

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 10 replies
  • 744 views
  • 7 likes
  • 5 in conversation