BookmarkSubscribeRSS Feed
MRDM
Obsidian | Level 7

I'm trying to read in some dates from excel and having a bit of a nightmare. I think I need to read it in as Character as the data is pretty poor so lots of it doesn't have dates in so needs some clean up. When I do this though SAS is reading in the wrong formats.

 

 

libname xl Excel "c:/myexcelfile.xlsm";
data test;
set xl.'OUTPUT$'n  (dbsastype = 
(DOB ='Char(10)',
OnboardingDate= 'Char10'));
run;
libname xl CLEAR;

In excel the date is 30/01/1983, SAS is outputting it as 1/30/1983 which is switching it to US Format and dropping the leading zero.

 

I've tried:

libname xl Excel "c:/myexcelfile.xlsm";
data test;
set xl.'OUTPUT$'n  (dbsastype = 
(DOB ='date9.',
OnboardingDate= 'date9.',));
run;
libname xl CLEAR;

Which outputs 30JAN83 which is the correct UK format but then any 'dirty' entries trigger either a warning or an error depending on the severity. Also I'm still investigating but it seems a significant amount aren't read in at all, even when they should be acceptable dates in excel.

 

What's the best way to handle this, other than fixing the people producing the files. If it matters it'll be used in user written code in DIS but I'm testing in EG as the code interface is nicer.

Thanks

8 REPLIES 8
ballardw
Super User

@MRDM wrote:

I'm trying to read in some dates from excel and having a bit of a nightmare. I think I need to read it in as Character as the data is pretty poor so lots of it doesn't have dates in so needs some clean up. When I do this though SAS is reading in the wrong formats.

 

 

libname xl Excel "c:/myexcelfile.xlsm";
data test;
set xl.'OUTPUT$'n  (dbsastype = 
(DOB ='Char(10)',
OnboardingDate= 'Char10'));
run;
libname xl CLEAR;

In excel the date is 30/01/1983, SAS is outputting it as 1/30/1983 which is switching it to US Format and dropping the leading zero.

 

I've tried:

libname xl Excel "c:/myexcelfile.xlsm";
data test;
set xl.'OUTPUT$'n  (dbsastype = 
(DOB ='date9.',
OnboardingDate= 'date9.',));
run;
libname xl CLEAR;

Which outputs 30JAN83 which is the correct UK format but then any 'dirty' entries trigger either a warning or an error depending on the severity. Also I'm still investigating but it seems a significant amount aren't read in at all, even when they should be acceptable dates in excel.

 

What's the best way to handle this, other than fixing the people producing the files. If it matters it'll be used in user written code in DIS but I'm testing in EG as the code interface is nicer.

Thanks


Just how are you checking for this "SAS is outputting it as 1/30/1983 which is switching it to US Format and dropping the leading zero." If the variable is character then SAS will not change any order, so most likely the value you are looking at is an actual date with a date format applied. Generally the SAS date formats use the leading zeroes.

34   data _null_;
35      date= '30Jan1983'd;
36      put "mmddyy10 format" +1 date= mmddyy10.;
37      put "mmddyy8 format" +1 date= mmddyy8.;
38      put "date9. format" +1 date= date9.;
39      put "ddmmyy format" +1 date= ddmmyy.;
40      put "yymmdd format" +1 date= yymmdd.;
41   run;

mmddyy10 format date=01/30/1983
mmddyy8 format date=01/30/83
date9. format date=30JAN1983
ddmmyy format date=30/01/83
yymmdd format date=83-01-30

All of these standard formats will display the leading 0 in days, or for numeric months < 10.

 

If you are looking in something sent back to Excel and reading it in Excel then it is almost certainly Excel dropping the leading 0 if this is a date value.

 

Unfortunately since Excel does not enforce any type of content for cells if you have "pretty poor" data because of people entering things you may have to try 1) setting the entire column in EXCEL to a single cell content, such as a specific date format and then 2) save the Excel file to a CSV format. Then use a data step to read the data where you can handle some of the "poor" data. This may mean using an informat like Anydtdte for dates in mixed formats or parsing text and processing it conditionally.

 

You should take responsibility for checking on Informats used and setting the desired formats for your date variables. SAS only supplies about 40 or so "standard" formats related to dates and with Proc Format you can make your own.

MRDM
Obsidian | Level 7

Thanks, but as it's an automated system handing hundreds of submitted files a week we can't manually change columns (besides it's already set as a date) or convert each to a csv. If the date is displaying correctly in Excel with the zero, it doesn't seem excel is dropping it, so SAS reading it as a character string should surely capture as is, I'm not sure why it's dropping a zero out a character field.

Excel already has it set as a date so I need a way for either character to keep the format and not drop the zero (I assumed character should just get the string). Or a way to drop any none dates from it as I read it in, instead of it either warning or out right failing.

 

sas date excel issue.png

I'm seeing it by running my code and checking the output data tab in EG, it's definitely a character field and has dropped the zero.

ballardw
Super User

I) You need to check EVERY cell of the first 10 or so lines of the Excel file.

If you have more than one header row then the text in the second header would make SAS think the entire column is text.

 

Run proc contents on that data set and show us the results.

If you can share a file that this occurs with that would be better. But I for one do not open Excel files from sources I do not know.

 

I am having a very hard time believing that this "In excel the date is 30/01/1983, SAS is outputting it as 1/30/1983 " occurs for a variable SAS treats as character.

Is there anything in the log about character to numeric conversions taking place?

 

And how are you writing that SAS value into Excel?

MRDM
Obsidian | Level 7

The excel files aren't being generated by SAS, they're my source data. It's hundreds of files a week so there's a lot of variations hence me trying to read as a character string so I can clean them up a bit (basically compress out all letters). As you can see below it IS transforming the data when reading in as a character.


Excel:

excel-ox.PNG

 

SAS (using the Char(10) code I put above:

sas-ox.PNG

SAS (using Date9. I put above, but this fails for entries with characters):

sas1-ox.PNG

 

I can't provide a sample file due to the nature of the data and we can't change the submission files as it's a live national system with hundreds of users. As much as we prefer csv etc. it isn't user friendly enough for the users in the field.

Basically I just want a raw import of the data as it is in Excel without SAS do anything to it.

Tom
Super User Tom
Super User

It looks to me like you are trying to ask EXCEL to convert the dates into strings by using the DBSASTYPE dataset option. Perhaps Excel has decided that you prefer dates in MDY order instead of DMY?

Where is the copy of Excel that your SAS conversion program accessing running?  What is it default language/region settings?  

MRDM
Obsidian | Level 7

It's a full SAS Grid setup so on the Windows Server Grid servers I guess, they're UK Locality. I guess I still don't get why it's trying to convert them when it's reading as a character string, they're correct when reading as Date9.

 

dbsastype should override the default format (probably date) and force it to use what I specify which is character, so why is it converting it incorrecly and dropping characters.

Tom
Super User Tom
Super User

DBSASTYPE is for telling SAS what TYPE of variable to create, not what display format to attach to it.

So if you tell SAS you want to create a DATE variable then it should convert the number of days stored in the Excel file into the number of days using SAS's method of counting in the numeric variable it creates for you in SAS.  You can then attach whatever format you want to the numeric variable that contains number of days to have it displayed in they way you want.  But as you say if the Excel file has some cells in the column that don't have date values then SAS will be forced to set the value to missing.

MRDM
Obsidian | Level 7
Right, but I'm telling it to do it as a character, so why is it converting it to US format and dropping leading zeroes.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 1658 views
  • 3 likes
  • 3 in conversation