BookmarkSubscribeRSS Feed
ljfahre
Fluorite | Level 6

On a daily basis, I work with a large xlsx data set that I did not author.  The data set contains dates for different events (lab reports, death dates, birth dates, etc.)  The author will have the word "NULL" for a date observation if the date of an event is unknown.  When the data set is brought into SAS, SAS converts the date variables into characters.  I need the dates converted back from the character format that SAS has assumed to their actual numeric dates.  I am working with SAS 9.4.  I converted the NULL values to "." to let SAS know that these are empty, but don't know where to go from here. 

 

  The code that converts it does not work.  It returns dates that are incorrect.  

 

*turns null into .*;

 

data work.me; set work.me;
if INCIDENT_DATE = "NULL" then INCIDENT_DATE = .;
else if CREATEDDT = "NULL" then CREATEDDT = .;
else if DEATH_DATE = "NULL" then DEATH_DATE = .;
else if DECEDENT_AGE = "NULL" then DECEDENT_AGE = .;
else if DECEDENT_DOB = "NULL" then DECEDENT_DOB = .; 
run;

 

*convert char to date in new variable*;

data work.me1;
set work.me;
incident_date1 = .;
incident_date1 = input(incident_date,$15.)
format incident_date1 DATETIME16.;
run;

11 REPLIES 11
ballardw
Super User

I am assuming that you are using Proc Import to read the data into SAS. I am correct?

 

You have several options that relate to how much work you want to do repetitively.

One: In the EXCEL sheet to a search and replace for NULL and replace with nothing. Then import it.

 

Two: Save the file as a CSV and import that one time. The procedure will generate data step code to read the CSV. Copy that and save in the editor. Then change the code to read the variable as a date instead of the default character you are currently getting. You did not provide any example of the values but from your data step that won't work I assume you are getting some form of datetime value. You could try changing the informat to ANYDTDTM20. and the Format to datetime.

If all of those date times are all 00:00:00 and you actually want dates instead of datetime values a different informat would likely work but we need to see what you are currently receiving.

It may be suitable to make a custom informat to read the "NULL" as missing and everything else with the actual layout. Can't suggest one without details of the values you are seeing in the CSV version. This custom informat will prevent the invalid data messages from appearing.

 

After you get this to work then either saving new versions of the XLSX to the same name/location CSV would be all that is needed to read the new version. I would suggest looking at the assigned lengths of the actual character variables to see if they are going to hold the longest expected values.

 

If the data changes and you need to keep previous data sets around then change the name of the output data set in the data step code as needed. The CSV approach will have an advantage of consistent lengths and properties that can change with the contents of an XLSX.

 

 

If this actually yields a correct datetime value:

data work.me1;
set work.me;
incident_date1 = .;
incident_date1 = input(incident_date,$15.)
format incident_date1 DATETIME16.;
run;

it is pure luck and I suspect you do not get such. Besides trying to read character values as numeric ($15 would attempt to create a 15 character text value) .

 

We have to see some actual values to provide more concrete examples.

ljfahre
Fluorite | Level 6

SASIssue.PNG   This is what the values look like when I import them into excel.  

ljfahre
Fluorite | Level 6

sorry not excel, SAS

Kurt_Bremser
Super User

Think of what your first data step will do if two or more variables are 'NULL'. And note that you cannot change the type of a variable (character to numeric in your case). You need to create a new variable, or assign an empty string.

Please supply examples for your non-'NULL' date strings, so we can determine the proper informat to read the dates, or datetimes

ljfahre
Fluorite | Level 6

SASIssue.PNG  This is what it looks like when I bring the files into SAS from excel.  

ballardw
Super User

@ljfahre wrote:

SASIssue.PNG  This is what it looks like when I bring the files into SAS from excel.  


You have another couple issues. The numeric values you are seeing are the number of days since 1Jan1900 (the excel date values in days) and the decimal portion is fraction of a day to represent seconds. Since I can't write code to read a picture: hint hint hint. and I am not going to retype all of the junk;

Here is an example converting one of those numbers into a datatime value.

 

data example;
   x="44023.348611111";
if x ne "NULL" then do; numx = input(x, best32.); datebit = int(numx)+ '01jan1900'd; /* if your time portion should not have fractions of a second */ timebit = round((24*60*60)* (numx-int(numx)),1); /* if it should have fractions of a second*/ timebit = (24*60*60)* (numx-int(numx)); dt = dhms(datebit,0,0,timebit);
end;
format datebit date9.; format dt datetime18.; run;

See if the resulting DT value matches the row from the Excel when you look at it in Excel. It may be off by one day depending on the version of Excel that was used. If so use '31DEC1899'd (I think is the most common fix).

I am showing several of the pieces so you have a chance to follow along.

First, use the correct informat with INPUT to create a numeric value. I used a wide BEST as I suspect the values you show are actually rounded and there could be more digits involved.

Numx is to get the integer portion of the value, i.e. days.

The Addition works to adjust to SAS dates as values SAS uses have 0 for 1 Jan 1960. The only way to specific a date in SAS as a literal value is as shown, do not use any of the mmddyy with slash/dash or other characters. It needs to match the Date9 format (date7 will work but not showing 4 digit years is stupid).

The Timebit code gets the fraction of a day and multiplies it by the number of seconds in a day to get the seconds value that SAS will want.

The function DHMS turns  Date, Hour, Minute, Second values into a SAS datetime value. We have all of the seconds in the Timebit value so use 0 or hour and minute.

 

The timebit shows two options depending on whether you expect the time portion to include fractions of second. All of the various conversions going on are likely to have unwanted fractions of second and I show how to round that to avoid that if desired. Or if the values should have fractions of second don't round.

 

Once you are convinced this is working as needed you can pretty much turn this into a single long line of code. If you have multiple variables then you will want an array to define the initial character versions, another array to store the resolved datetime, and a loop to use and and assign the needed values. An exercise for the interested reader.

Kurt_Bremser
Super User

You need to use '30dec1899'd as offset.

Why?

  • SAS starts with 1960-01-01 as day 0, Excel with 1900-01-01 as day 1
  • since Excel wrongfully considers 1900 a leap year, we need to add another day to the offset 
ballardw
Super User

@Kurt_Bremser wrote:

You need to use '30dec1899'd as offset.

Why?

  • SAS starts with 1960-01-01 as day 1, Excel with 1900-01-01 as day zero
  • since Excel wrongfully considers 1900 a leap year, we need to add another day to the offset 

Considering I have some Excel sheets that were happy with 0 Jan 1900 id never remember which offsets for which version

Reeza
Super User
Is this a process you need to streamline and automate? Or a one time process?
Either way consider either converting to CSV and importing from there where you can easily do that via commands or batch.
Another option is to clean the Excel using PowerQuery ahead of time.
https://docs.microsoft.com/en-us/power-query/power-query-what-is-power-query

Unfortunately it also looks like you have a mix of dates and datetimes in Excel, not just dates.
Patrick
Opal | Level 21

@ljfahre 

The discussion and proposed solutions here are likely also applicable for your case.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 11 replies
  • 1849 views
  • 11 likes
  • 5 in conversation