- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
Recently we are hitting a problem in date format when we read data from external file using proc import and we are finally exporting into a csv file. The dates are exporting into csv as US format ( not UK ).
Eg: The excel_date in excel_test.xlxs file is 12/01/2024 ( dd/mm/yyyy )
proc import;
out=test_import ( keep = excel_date )
Datafile= excel_test.xlxs
dbms=xlsx;
run;
data test1;
format new_date date9,;
set test_import;
new_date = excel_date;
run;
proc export data = test1
outfile="new_date.csv"
dbms=csv;
run;
When we run the code, output of sas dataset (test1) as
new_date excel_date
12JAN2024 01/12/2024 ( getting changed here)
While exporting into csv ( new_date.csv ) from test1 sasdataset ,
new_date is exported as 12JAN2024 and excel_date as 01/12/2024 which matches the dataset value. But the actual value for excel_date value in the excel as 12/01/2024 but the value gets changed in csv for that column.
If the value of date in excel is 27/01/2024 , then the csv export dont have any problem as it export the value correctly to csv. I suspect the problem is when the value in the excel file has date value less than or equal 12. If the date is greater than 12 , sas not able to convert as we dont have month greater than 12.
I was running below query to check anything got changed, but no luck
proc options option=(DATESTYLE LOCALE) value;run;
Option Value information for SAS Option Datastyle
Value : MDY
Option Value information for SAS Option Locale
Value: EN_GB
Only recently we are getting this problem, until last month there were no issue. Even in linux OS where SAS is installed ,we do have locale setting as en_GB.
It seems the excel date values are stored in sas dataset as US format.
Please let me know if there is a fix to resolve the issue. We cant change sas code as there are lot of sas code has date values and as I said , this code was working for years.
Thanks
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Look at the raw number stored in the Excel file. Format the cell as a simple number, and show us what you get.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
A CSV file has no "format", only strings. Are you saying the date strings in the CSV file are in MDY order and not DMY order?
If you use GUESSINGROWS=MAX and there is at least one observation that has a day of the month larger then 12 then PROC IMPORT should pick the right INFORMAT for the column.
But if all of the day of month values are 12 or less then there is NO WAY for PROC IMPORT to KNOW what was intended by the strings in the file.
But why are you using PROC IMPORT to read CSV files? Since you seem to know what is in them just write the data step to read them yourself. Then you can decide what informat to use. In general I find it is always better and frequently easier to write the data step myself.
Note that the DATESTYLE option effects the decisions made by the ANY... series of informats. But PROC IMPORT will only use that informat if the text values it sees in the column are inconsistent. If they all look like MDY date strings it will use the MMDDYY informat. If the all look like DMY date strings it will use the DDMMYY informat.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello @freshstarter
From what that has been posted it is difficult to ppin point the cause.
However, a small change in the format statement "format new_date date9," to " "format new_date ddmmyys10.;" should solve your problem.
This can be verified from the code below.
option locale="EN_GB";
data test;
format dt1 ddmmyys10.;
dt1="12JAN2024"d.;
run;
proc export data=test outfile="/full_path/test.csv";
run;
The csv file had the correct format "12/01/2024" irrespective of where you open it (notepad or Excel).
I have verified with both 12th Jan and 31st Jan. It works as expected.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I want to let you know that there are two environments we do have .. Prod and Dev.. only in prod environment, we are facing this issue not in Dev.. I will answer for your queries soon
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
It sounds like there has been a recent change to your Prod environment.
Have you already tried what happens if you set at the beginning of your job below two options.
options datestyle=dmy locale=en_gb;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@freshstarter wrote:
Hello all,
I want to let you know that there are two environments we do have .. Prod and Dev.. only in prod environment, we are facing this issue not in Dev.. I will answer for your queries soon
What is different between the two environments? Are they using different computer systems? Different SAS installations? Different versions of SAS? Different user running the program?
But the key difference is probably that they are reading different CSV files. PROC IMPORT has to GUESS how to define the dataset based on just the information in the single file it is looking at. Give it a slightly different input file and you can get different variable names, variable types, variable storage lengths, different informats used, different formats attached.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
In case this issue is occurring only in the production environment, please work with your SAS Administrator and investigate the cause. Do look into the program and server logs.
If possible, enable the workspace server log to capture the logs for this program.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
And, indeed, in the SAS datasets, there is no formatting, just a numeric value. Formats are used to read and write those values. In short, in short, just a representation of a natural number.
Please give a look to the documentation:
https://documentation.sas.com/doc/en/vdmmlcdc/8.1/ds2pg/p0bz5detpfj01qn1kz2in7xymkdl.htm
"...The following table displays the results of formatting the date March 17, 2012..."
DATE9. :19069 —> 17MAR2012
In short, if you use date9. the value 19069 in the dataset will be formated as 17MAR2012, which is what you are experiencing and not desired.
When you want to keep the format 01/12/2024, you need to use mmddyy10. I think this was indicated earlier.
MMDDYY10. : 19069 —> 03/17/2012
This should resolve any issue, regardless of any undesired difference between the Dev and Prod environments.
I suggest to work on the solution you can be in control of (your code), and raise the fact the 2 environments are different to your administrators, without you needing to wait for them.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
So if you started with an XSLX file note that the LOCALE used in EXCEL can change how EXCEL displays the exact same values when formatted as "date".
In this XLSX file I typed the same 12/31/2024 string into the cells A2 to A4. I then use the menus to leave the formatting as DATE but change the LOCALE used for each of them.
When I use PROC IMPORT to make a SAS dataset the result is a variable with the MMDDYY10. format attached to it.
If I ask SAS to read each line separately
proc import dbms=xlsx replace
file='c:\downloads\date_regions.xlsx'
out=us
;
range='$A2:C2';
getnames=no;
run;
proc print;
title "US line";
run;
proc import dbms=xlsx replace
file='c:\downloads\date_regions.xlsx'
out=uk
;
range='$A3:C3';
getnames=no;
run;
proc print;
title "UK line";
run;
proc import dbms=xlsx replace
file='c:\downloads\date_regions.xlsx'
out=estonian
;
range='$A4:C4';
getnames=no;
run;
proc print;
title "Estonian line";
run;
then I get formats attached the match the display used by the XLSX file.
But in addition to the LOCALE setting that I modified on those individual files I suspect that the XLSX and/or the EXCEL version you are running might have their own settings for LOCALE.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello All,
First thank you for all your suggestions. While digging more into the environments, I have found different sort of issue and the date output issues are same across the mulitple environments. Initially I have considered wrong examples and thats the reason I have mentioned the output are different across enviroment. Sorry for that.
Now we have come acrosss that datestyle output is different format in proc options;run; from workspace server and we have changed it 'DMY' and tried running the query. It has fixed the issue while reading the data from .csv file ( infile statement ) but not from excel ( using proc import ).
Eg: Reading from excel
----------------------
proc import
out = input_files ( keep = start_Date )
Datafile = "xxxx/input_excelfile.xlxs"
dbms=xlxs Replace;
getnames=yes;
run;
data temp;
set input_files;
format newdate date9;
newdate=start_Date;
run;
THe date available in excel as 02/08/2023 and If I run the above code , I'm getting as below
start_date newdate
08/02/2023 02AUG2023
Datestyle setting in workspace server is DMY and the locale setting in SAS config / Linux OS and excel sheet are in United Kingdom. I dont know why the conversion is happening while performing the proc import which is causing a problem. Please suggest me with your ideas.
Thanks once again for all your comments.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@freshstarter In your Excel are these cells with date defined as character or date or something else.
What would help:
- Create a test Excel with just a date column and a few dates that creates this issue in your environment
- Share with us this Excel here as an attachment and tell us what the issue is you observe and what the expected result should be
- Please share with us the exact Excel you've used for your testing
Such an Excel would allow us to do some testing to see if we can replicate the issue you describe.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Attached here the input file.
The date in the excel file is 02/08/2023 ( which is 02nd Aug 2023 ) . Our final result expected to be the same in another csv file. This is how our workflow created
1. Proc import from xlsx file
2. Do sas processing
3. then finally Proc export to csv file.
While exporting to csv file, the dates are loaded as 08/02/2023 which is wrong.
Another strange behvaiour , I have noticed now is that as per below workflow
1. Proc import from xlsx file
2. Do sas processing
3. then finally Proc export to another xlsx file ( not csv ).
With the same excel sheet , the dates are imported into sas dataset as 08/02/2023 but while exporting the dataset into another new xlxs file, its exported as 02/08/2023 ( 02nd Aug ) which is correct.
Now I think the problem arises, when we importing and exporting to different formats ( i.e. xl to csv ) which is very strange. This change in behaviour has happended very recently.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
If you want the DATE variables to use the DDMMYY format then tell SAS that.
If you KNOW the names of the variables just add a FORMAT statement at some point in your process.
If you have no idea what date variables you will get then (does not sound like a repeatable process) then first ask SAS to make a list of the names of the DATE variables into a macro variable and use that to write the FORMAT statement.
Let's say you imported the XSLX sheet into a dataset named WORK.TEST using something like this:
proc import dbms=xlsx out=test replace
file="C:\downloads\input_excelfile.xlsx"
;
run;
You can then do something like this to change the format attached to ALL of the date variables.
proc contents data=test noprint out=contents;
run;
proc sql noprint;
select nliteral(name) into :datevars separated by ' '
from contents
where fmtinfo(format,'cat')='date'
;
quit;
proc datasets lib=WORK nolist;
modify test;
format &datevars ddmmyy10.;
run;
quit;
As to what changed I suspect that somehow your default location has changed from a country like the United Kingdom that prefers DMY order to a country like the United States that prefers MDY order and either SAS or EXCEL is honoring those changed settings.
To avoid this issue in the future I suggest that switch to use YMD order for you DATE values. That will not get confused.