BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
jpprovost
Quartz | Level 8

Hi there,

 

I have an Excel file which contains two sheets with datas and they both have the same columns especially one column for a DATE.

 

I do use the following code to import both sheets into tables.

 

PROC IMPORT DATAFILE='/myrep/myfile.xlsx' 
OUT=DATA1
DBMS = XLSX
REPLACE;
SHEET = "SHEET1";
RUN;

PROC IMPORT DATAFILE='/myrep/myfile.xlsx' 
OUT=DATA2
DBMS = XLSX
REPLACE;
SHEET = "SHEET2";
RUN;

 I can assure that when I check in the Excel file, both sheets contains the exact same format for DATE. However, once imported into SAS, my table DATA1 displays the DATE in a correct format (eg: 12/06/2016), but in the table DATA2 it is listed in the SAS internal date (number of days from Jan 1, 1960).

 

Since those two sheets contains similar datas and identical columns name, I want to join the two into one. However, I got an error that says that the variable (DATE) is in one sheet numerical and one the other sheet alphanumerical.

 

I tried to look into this community to know if there's a way to "force" SAS to read the value of the variable DATE with a specific format, but I didn't found anything for PROC IMPORT (I see that within a DATA STEP we can do it with the INFILE command).

 

Any help would be greatly appreciated.

 

Thanks in advance!

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

The two sheets are obviously NOT the same.

SHEET1 has DATE in a consistent format.  Either all DATEs or all character strings that look to you like dates.

SHEET2 has mixed numeric (dates are numeric) and character values in DATE.  What SAS does when the column has mixed types is create a character variable.  The character strings are copied as is. The numeric (date) values are converted into strings of digits that look like the number that EXCEL uses to store dates.  So your 12/06/2016 example will be either the string '42710' or the string '42533' depending on whether it is December 6th of 2016  or the 12th of June of 2016.

 

To fix this after you have the data in SAS already you will need to convert the strings back to a number using INPUT.

For the ones that look like general numbers just use the normal numeric informat and then adjust for the different base dates used by Excel and SAS.

real_date=input(DATE,32.)+'30DEC1899'd ;

For the ones that look like dates then use an informat that matches what they look like. So if the values are in MM/DD/YYYY style then use the MMDDYY10. informat.

real_date=input(DATE,mmddyy10.);

You can then attach one of the many formats that SAS has to display dates to the new numeric variable REAL_DATE.

 

You could use either the MMDDYY10. or DDMMYY10. format to display the dates like your example date. Which one to use depends on whether your value of '12/06/2016' meant the 12 of June or December 6th.  Personally I would use either DATE9. or YYMMDD10. to avoid that confusion.

 

 

View solution in original post

15 REPLIES 15
Reeza
Super User

You're correct, you cannot control the types coming from the Excel file. 

 

As to why, although you're not seeing an issue there's likely an issue with some dates or lots of blanks that is causing SAS to think it's a character type. You have two ways to fix this IMO, one is to clean teh data in Excel by ensuring the date format is fixed and verifying that the dates are all valid. You can have 31/01/2019 and 01/31/2019 in the same column in Excel but SAS expects it to have the same format across all rows and when that isn't true it forces it to character format. 

 

The second method is to force it to a SAS date using input and then any mistakes will likely show up in your log. 

You should convert them to actual SAS dates, ie the number one and apply a date format. 

 

data clean;
set raw_data;
*converts to SAS date/numeric format; date_num = input(date_char, mmddyy10.); format date_num date9.; run;

 

Here's a great, but longer and in depth, reference for dates and times in SAS
https://communities.sas.com/t5/SAS-Communities-Library/Working-with-Dates-and-Times-in-SAS-Tutorial/...

 


@jpprovost wrote:

Hi there,

 

I have an Excel file which contains two sheets with datas and they both have the same columns especially one column for a DATE.

 

I do use the following code to import both sheets into tables.

 

PROC IMPORT DATAFILE='/myrep/myfile.xlsx' 
OUT=DATA1
DBMS = XLSX
REPLACE;
SHEET = "SHEET1";
RUN;

PROC IMPORT DATAFILE='/myrep/myfile.xlsx' 
OUT=DATA2
DBMS = XLSX
REPLACE;
SHEET = "SHEET2";
RUN;

 I can assure that when I check in the Excel file, both sheets contains the exact same format for DATE. However, once imported into SAS, my table DATA1 displays the DATE in a correct format (eg: 12/06/2016), but in the table DATA2 it is listed in the SAS internal date (number of days from Jan 1, 1960).

 

Since those two sheets contains similar datas and identical columns name, I want to join the two into one. However, I got an error that says that the variable (DATE) is in one sheet numerical and one the other sheet alphanumerical.

 

I tried to look into this community to know if there's a way to "force" SAS to read the value of the variable DATE with a specific format, but I didn't found anything for PROC IMPORT (I see that within a DATA STEP we can do it with the INFILE command).

 

Any help would be greatly appreciated.

 

Thanks in advance!


 

SASKiwi
PROC Star

What are the column types in Excel of the two dates? "General" is the default column type. In my experience, making sure these are both "Date" may help. Another trick is to add a dummy row of data as the first row in both sheets with non-missing values for all cells. This should help improve consistency.

Tom
Super User Tom
Super User

The two sheets are obviously NOT the same.

SHEET1 has DATE in a consistent format.  Either all DATEs or all character strings that look to you like dates.

SHEET2 has mixed numeric (dates are numeric) and character values in DATE.  What SAS does when the column has mixed types is create a character variable.  The character strings are copied as is. The numeric (date) values are converted into strings of digits that look like the number that EXCEL uses to store dates.  So your 12/06/2016 example will be either the string '42710' or the string '42533' depending on whether it is December 6th of 2016  or the 12th of June of 2016.

 

To fix this after you have the data in SAS already you will need to convert the strings back to a number using INPUT.

For the ones that look like general numbers just use the normal numeric informat and then adjust for the different base dates used by Excel and SAS.

real_date=input(DATE,32.)+'30DEC1899'd ;

For the ones that look like dates then use an informat that matches what they look like. So if the values are in MM/DD/YYYY style then use the MMDDYY10. informat.

real_date=input(DATE,mmddyy10.);

You can then attach one of the many formats that SAS has to display dates to the new numeric variable REAL_DATE.

 

You could use either the MMDDYY10. or DDMMYY10. format to display the dates like your example date. Which one to use depends on whether your value of '12/06/2016' meant the 12 of June or December 6th.  Personally I would use either DATE9. or YYMMDD10. to avoid that confusion.

 

 

jpprovost
Quartz | Level 8

I am curious to know why you add this to the date?
+'30DEC1899'd

After doing it, it goes (example) from 43004 to 26SEP2017.

Here's the code I type :

DATA test1;
SET test;
real_date=input(DATE,32.)+'30DEC1899'd ;
format real_date date9.;
run;

Also, this is what I did this morning as a test :

 

DATA test1;
SET test;
real_date=input(DATE,yymmdd10.) ;
run;

And it gives me a lot of "." (missing values). I struggle to understand why?

Kurt_Bremser
Super User

Excel stores dates as counts of days, starting with day 1 on 1900-01-01. But Lotus 1-2-3 (from which Excel was mostly copied) made the mistake of thinking that 1900 was a leap year, which it wasn't. So all dates in Excel (as MS happily perpetuates the error) between 1900-01-01 and 1900-03-01 are calculated incorrectly.

So we have to assume day zero not as the last day of 1899, but the next-to-last day when we get data from Excel.

 

Note that LibreOffice Calc and other spreadsheet programs have corrected the mistake regarding the leapyear and do also use 1899-12-30 as day 0. That's why they also allow dates before that, as they can correctly calculate them.

jpprovost
Quartz | Level 8

@Kurt_Bremser wrote:

Excel stores dates as counts of days, starting with day 1 on 1900-01-01. But Lotus 1-2-3 (from which Excel was mostly copied) made the mistake of thinking that 1900 was a leap year, which it wasn't. So all dates in Excel (as MS happily perpetuates the error) between 1900-01-01 and 1900-03-01 are calculated incorrectly.

So we have to assume day zero not as the last day of 1899, but the next-to-last day when we get data from Excel.

 

Note that LibreOffice Calc and other spreadsheet programs have corrected the mistake regarding the leapyear and do also use 1899-12-30 as day 0. That's why they also allow dates before that, as they can correctly calculate them.


Sir,

 

This is a very very useful info that I just learn from you today.

 

If I convert my Excel file into a CSV file, will I have the same struggle with the date? (Just curious)

ballardw
Super User

@jpprovost wrote:

@Kurt_Bremser wrote:

Excel stores dates as counts of days, starting with day 1 on 1900-01-01. But Lotus 1-2-3 (from which Excel was mostly copied) made the mistake of thinking that 1900 was a leap year, which it wasn't. So all dates in Excel (as MS happily perpetuates the error) between 1900-01-01 and 1900-03-01 are calculated incorrectly.

So we have to assume day zero not as the last day of 1899, but the next-to-last day when we get data from Excel.

 

Note that LibreOffice Calc and other spreadsheet programs have corrected the mistake regarding the leapyear and do also use 1899-12-30 as day 0. That's why they also allow dates before that, as they can correctly calculate them.


Sir,

 

This is a very very useful info that I just learn from you today.

 

If I convert my Excel file into a CSV file, will I have the same struggle with the date? (Just curious)


With a CSV file when using proc import you can specify an option GUESSINGROWS to look at more rows of the data before guessing the appropriate variable type and informat. If the values of an apparent date column vary SAS may use the ANYDTDTE, any date, format which will read some mixes of date such as 01JAN2012, 01012012 and 1/1/2012 as the same date value. Or if you encounter such differences you can specify the Informat in the code you write.

Tom
Super User Tom
Super User

@jpprovost wrote:

I am curious to know why you add this to the date?
+'30DEC1899'd

After doing it, it goes (example) from 43004 to 26SEP2017.

Here's the code I type :

DATA test1;
SET test;
real_date=input(DATE,32.)+'30DEC1899'd ;
format real_date date9.;
run;

Also, this is what I did this morning as a test :

 

DATA test1;
SET test;
real_date=input(DATE,yymmdd10.) ;
run;

And it gives me a lot of "." (missing values). I struggle to understand why?


SAS and Excel have different ideas about whether to start counting from 1 or 0.  Also Excel thinks 1900 was a leap year.  That is why you should use 30DEC1899 instead of 01JAN1900.

 

To your second question, you told SAS to interpret DATE as if it had strings that look like '2016-12-30' (or '2016/12/30' or '20161230' etc. anything that the YYMMDD informat can recognize.).  But before you said it had strings that look like '30-12-2016' or '12-30-2016' or '43004'.  None of those are going to work with the YYMMDD informat.  If you are getting errors in the log then you should also see in the log some of the values of DATE that generated the error.

 

Try this:

data test1;
  set test;
  if length(date)=5 then real_date=input(DATE,32.)+'30DEC1899'd ;
  else real_date=input(DATE,mmddyy10.) ;
  format real_date date9.;
run;
jpprovost
Quartz | Level 8

@Tom wrote:

@jpprovost wrote:

I am curious to know why you add this to the date?
+'30DEC1899'd

After doing it, it goes (example) from 43004 to 26SEP2017.

Here's the code I type :

DATA test1;
SET test;
real_date=input(DATE,32.)+'30DEC1899'd ;
format real_date date9.;
run;

Also, this is what I did this morning as a test :

 

DATA test1;
SET test;
real_date=input(DATE,yymmdd10.) ;
run;

And it gives me a lot of "." (missing values). I struggle to understand why?


SAS and Excel have different ideas about whether to start counting from 1 or 0.  Also Excel thinks 1900 was a leap year.  That is why you should use 30DEC1899 instead of 01JAN1900.

 

To your second question, you told SAS to interpret DATE as if it had strings that look like '2016-12-30' (or '2016/12/30' or '20161230' etc. anything that the YYMMDD informat can recognize.).  But before you said it had strings that look like '30-12-2016' or '12-30-2016' or '43004'.  None of those are going to work with the YYMMDD informat.  If you are getting errors in the log then you should also see in the log some of the values of DATE that generated the error.

 

Try this:

data test1;
  set test;
  if length(date)=5 then real_date=input(DATE,32.)+'30DEC1899'd ;
  else real_date=input(DATE,mmddyy10.) ;
  format real_date date9.;
run;

Oh gosh, it makes so much sense now.

I understand that if I use put(source, format), the second argument will be the "output" format.

If I use input(source, informat), the second argument is the "input" (or read) format.

Am I wrong?

Tom
Super User Tom
Super User

Formats convert values to text. Informats convert text to values.

jpprovost
Quartz | Level 8
But the informat it's not what we want as an output, but it tells SAS to read the text from a specific INFORMAT and to convert it into value. Right?
Tom
Super User Tom
Super User

@jpprovost wrote:
But the informat it's not what we want as an output, but it tells SAS to read the text from a specific INFORMAT and to convert it into value. Right?

Yes. The type of the informat (numeric or character) will determine if it returns a number of a character string. 

 

What actual value it returns for the given input is the whole purpose of using an informat.  So a string like '20161230' could be read using YYMMDD8. informat and return the numbers of days since 1960 for the date december 30th of 2016.  Or when read with the informat 8. it would return the number 20,161,230 instead.  Or if read with the informat $8 it would just return the value unchanged.  But if read with the informat $4 it would return just '2016'.

Dumi1
Fluorite | Level 6
Thank you, I was struggling as well about this. This really helped
Kurt_Bremser
Super User

Welcome to the world of using Excel files for data transport, aka as the world of pain.

Save your sheets to csv files and import those with the same data step, and you get identical structures. A little more work up front, a lot less trouble later.

See Maxims 22, 27, 31.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 15 replies
  • 1525 views
  • 9 likes
  • 8 in conversation