BookmarkSubscribeRSS Feed
analyticalepi
Calcite | Level 5

I am importing very messy data in xls. or xlsx. files. One file may have multiple date formats, such as mm/yyyy, d-mmm, mmm-yy, etc. I want to get them all into the same date format, preferably mmddyy10., but if a different standardized format would work better I will take it. Using conditional logic would be easy enough to parse through the different date formats to assign the standard format. Is there a crosswalk of all excel data formats to SAS date formats/informats? For example, the following code set the date_want variable to the mmddyy10. format if the excel date format was MMM-YY or MM/YYY:

data want;
format date_want mmddyy10.;
set have;
date_want = input(original_date, anydtdte.);
run;

But, this didn't work on the D-MMM formats. How do I recode this format or any other custom excel date formats I may come across?

6 REPLIES 6
Kurt_Bremser
Super User

As long as you have Excel date values (count of days, day zero = 1899-12-30) throughout a column, setting a format for the whole column in Excel before importing into SAS should do the trick.

If you actually have text values in the spreadsheet, return to sender and ask for correction. It's not your job to make guesses.

analyticalepi
Calcite | Level 5
Thanks for your response Kurt. Looking at the excel files, the date is complete regardless of the format (ex. 12/01/1981 is formatted as 1-Dec). Unfortunately changing the format in the excel file is not an option for my client wants all edits to be made in SAS. Any thoughts, aside from manually recoding these dates in custom excel formats?
Kurt_Bremser
Super User

If you end up with a string containing something like "1-Dec" after the import, then that's the end of it. You don't know the year, so there's nothing you can do in SAS. Correct the Excel spreadsheet.

If you end up with the correct date (days from 1960-01-01) in another format, that's fine. A date is a date.

If you get numbers (the internal counts of Excel), you can convert those by adding '30dec1899'd and assigning a date format to the variable.

 

So you need to show us what exactly you get in SAS after importing.

Example values, the type of variable, and the format assigned to it (run PROC CONTENTS for this).

ballardw
Super User

@analyticalepi wrote:
Thanks for your response Kurt. Looking at the excel files, the date is complete regardless of the format (ex. 12/01/1981 is formatted as 1-Dec). Unfortunately changing the format in the excel file is not an option for my client wants all edits to be made in SAS. Any thoughts, aside from manually recoding these dates in custom excel formats?

So do you have data that was actually entered in 1981???

If not then someone has done something as the 1-Dec should have the year not display only if it is the current year when the value is entered. (REALLY Dislike this "feature" in Excel).

 

If you highlight a column in Excel and change the format I do not see where any EDIT was made. Edit usually would refer to change of VALUE, not appearance.

 

Or consider it 1) Job security and 2) a chance to pad billing hours.

I have found that sometimes going to the source of the "must do this/not do this" with a cost, either in actual money or resources like man-hours (your work time to manually recode in SAS) will quite often lead to an attitude adjustment.

 

I have one source that sends me garbage with date fields formatted as currency, and personal identifiers and currency formatted as dates sometime for only some rows of data. It is much easier to fix in Excel.

 

Setting a column to a standard format often reveals the values that have been entered as text and may need actual edits.

Reeza
Super User
SAS usually reads that fine, as it tries to read in the underlying data not the partially shown data IME.
The files are XLSX? And you're importing them using PROC IMPORT I assume or an ODBC connection???
Reeza
Super User

SAS requires all dates to have a day, month and year portion and your format of " D-MMM" is missing that. You should handle this case separately. 

 

If you have multiple date formats in your data set though how are you detecting the difference between MM/DD/YYYY and DD/MM/YYYY? SAS will use only one and your field sounds too messy to be able to be handled cleanly. 

 

I wouldn't trust ANYDTDTE in this case and would look at all the possible cases you have and then manually code them in. Data cleaning is painful but required for good analysis.

 

FYI - whoever is requesting that all transformation happens in SAS is correct as this gives you a reproducible and traceable method of fixing the data that can be applied to other files. Doing it manually doesn't have that benefit. 

 


@analyticalepi wrote:

I am importing very messy data in xls. or xlsx. files. One file may have multiple date formats, such as mm/yyyy, d-mmm, mmm-yy, etc. I want to get them all into the same date format, preferably mmddyy10., but if a different standardized format would work better I will take it. Using conditional logic would be easy enough to parse through the different date formats to assign the standard format. Is there a crosswalk of all excel data formats to SAS date formats/informats? For example, the following code set the date_want variable to the mmddyy10. format if the excel date format was MMM-YY or MM/YYY:

data want;
format date_want mmddyy10.;
set have;
date_want = input(original_date, anydtdte.);
run;

But, this didn't work on the D-MMM formats. How do I recode this format or any other custom excel date formats I may come across?


 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 504 views
  • 0 likes
  • 4 in conversation