Hi,
I have a macro that loops through a dataset that contains the file paths of around 200 CSV's and imports them.
The CSV files do not all have the same columns, however, one column that every CSV does have is a date column, in ddmm10 format. It will always be in the 2nd column, the 1st column will also be the same across all CSV's.
Where the first few rows in the CSV have dates >=12th of the month, SAS correctly assigns ddmmyy10. informat. But when this is not the case, it incorrectly assigns informat mmdd10., then when it comes to a date >=12th of the month, it fails to import that piece of data.
I have tried adding a GUESSINGROWS=500 to force SAS to evaluate more rows before assigning an informat, but this does not resolve the issue.
I believe because of the varying columns across the CSV's, an 'infile' statement is out of the question. I have also seen some ideas to suggest copying the log where the proc import has generated datasteps into a program and remove any non SAS code, however, due to the number of CSV's involved this would be a long and tedious exercise.
Could anyone suggest a way round this and force SAS to assign a ddmmyy10. informat to date fields in the proc import?
Kind Regards,
cxkev
I would create files with the first two columns using ‘infile’ statement, then combine these files
with files created by proc import:
Data want;
merge cvs1_part1 cvs1_part2(drop=column_1 column_2);
Run;
Hi,
I assume you have put: mixed=yes; in your proc import?
You can also have some code in between imports to fix it if nothing else works:
data have1;
attrib mytext format=$20. mydate format=mmddyy10.;
mytext="Hello";
mydate='12jan14'd;
output;
mytext="World";
mydate='13Nov13'd;
output;
run;
proc sql;
create table TMP as
select distinct NAME
from DICTIONARY.COLUMNS
where LIBNAME="WORK"
and MEMNAME="HAVE1"
and FORMAT="MMDDYY10.";
quit;
data _null_;
set TMP;
call execute('data have1;
set have1 (rename=('||strip(name)||'=tmp_var));
attrib '||strip(name)||' format=ddmmyy10.;
'||strip(name)||'=tmp_var;
run;');
run;
RW9 wrote:
You can also have some code in between imports to fix it if nothing else works:
data have1;
attrib mytext format=$20. mydate format=mmddyy10.;
mytext="Hello";
mydate='12jan14'd;
output;
mytext="World";
mydate='13Nov13'd;
output;
run;proc sql;
create table TMP as
select distinct NAME
from DICTIONARY.COLUMNS
where LIBNAME="WORK"
and MEMNAME="HAVE1"
and FORMAT="MMDDYY10.";
quit;data _null_;
set TMP;
call execute('data have1;
set have1 (rename=('||strip(name)||'=tmp_var));
attrib '||strip(name)||' format=ddmmyy10.;
'||strip(name)||'=tmp_var;
run;');
run;
The results of the proc import, where the date informat incorrectly assigns a mmddyy means that anything >=12th of month, is imported into SAS with a . value, so there is no value there to change
I have tried mixed=yes, however, results in an error. I have researched this after our suggestion as it was the first time I'd heard of this option. Someone had wrote that it is only compatible with dbms=xls.
I wonder if changing the datestyle system option would have any effect?
data_null_; wrote:
I wonder if changing the datestyle system option would have any effect?
Thanks for the suggestion, unfortunately, this has not solved the issue.
Did you change the datestyle option or just run the code data _null_ provided?
proc options option=datestyle;
run;
options datestyle=dmy;
proc options option=datestyle;
run;
*Change back;
*options datestyle=mdy;
Reeza wrote:
Did you change the datestyle option or just run the code data _null_ provided?
proc options option=datestyle;
run;
options datestyle=dmy;
proc options option=datestyle;
run;
*Change back;
*options datestyle=mdy;
Hi Reeza,
I added a options datestyle=dmy; to the start of my program.
I would create files with the first two columns using ‘infile’ statement, then combine these files
with files created by proc import:
Data want;
merge cvs1_part1 cvs1_part2(drop=column_1 column_2);
Run;
Linlin wrote:
I would create files with the first two columns using ‘infile’ statement, then combine these files
with files created by proc import:Data want;
merge cvs1_part1 cvs1_part2(drop=column_1 column_2);
Run;
Thanks very much, that worked! Simple when you know how .
Thanks to everyone for your help.
If there were rules indicating the column structure from the file names
or a consistent informat for each column with known column header
then you would not be constrained by the "flexibility" offered by proc import.
you would have ways to pre-determine the informats you need and data steps could be provided with suitable informats and infile options in a programmatic way.
good luck
If the many csv files are all the same format then it is not a lot of code. You only change the input file reference and the outpu data set name once it is working.
You didn't say how many records your files have, you might try a much larger value than 500 for guessingrows, especially if your data is sorted by the date.
Personally I only use proc import on CSV files to build a program to modify especially if there are more than one input file of the same layout to process. Generally I set text fields such as names, addresses, account identifiers or other to about 25% longer than found in the first file because such things tend to vary from file to file and if I guess right I have all of the data with the same lengths which can be important if the files ever get combined (since they are the same layout the likelihood is pretty large).
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.