BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
cxkev182
Fluorite | Level 6

Hi,

I have a macro that loops through a dataset that contains the file paths of around 200 CSV's and imports them.

The CSV files do not all have the same columns, however, one column that every CSV does have is a date column, in ddmm10 format. It will always be in the 2nd column, the 1st column will also be the same across all CSV's.

Where the first few rows in the CSV have dates >=12th of the month, SAS correctly assigns ddmmyy10. informat. But when this is not the case, it incorrectly assigns informat mmdd10., then when it comes to a date >=12th of the month, it fails to import that piece of data.

I have tried adding a GUESSINGROWS=500 to force SAS to evaluate more rows before assigning an informat, but this does not resolve the issue.

I believe because of the varying columns across the CSV's, an 'infile' statement is out of the question. I have also seen some ideas to suggest copying the log where the proc import has generated datasteps into a program and remove any non SAS code, however, due to the number of CSV's involved this would be a long and tedious exercise.

Could anyone suggest a way round this and force SAS to assign a ddmmyy10. informat to date fields in the proc import?

Kind Regards,

cxkev

1 ACCEPTED SOLUTION

Accepted Solutions
Linlin
Lapis Lazuli | Level 10

I would create files with the first two columns using  ‘infile’ statement, then combine these files
with files created by proc import:

Data want;

  merge cvs1_part1  cvs1_part2(drop=column_1 column_2);

  Run;

View solution in original post

12 REPLIES 12
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Hi,

I assume you have put: mixed=yes; in your proc import?

RW9
Diamond | Level 26 RW9
Diamond | Level 26

You can also have some code in between imports to fix it if nothing else works:

data have1;
  attrib mytext format=$20. mydate format=mmddyy10.;
  mytext="Hello";
  mydate='12jan14'd;
  output;
  mytext="World";
  mydate='13Nov13'd;
  output;
run;

proc sql;
  create table TMP as
  select  distinct NAME
  from    DICTIONARY.COLUMNS
  where   LIBNAME="WORK"
    and   MEMNAME="HAVE1"
    and   FORMAT="MMDDYY10.";
quit;

data _null_;
  set TMP;
  call execute('data have1;
                  set have1 (rename=('||strip(name)||'=tmp_var));
                  attrib '||strip(name)||' format=ddmmyy10.;
                  '||strip(name)||'=tmp_var;
                run;');
run;

cxkev182
Fluorite | Level 6

RW9 wrote:

You can also have some code in between imports to fix it if nothing else works:

data have1;
  attrib mytext format=$20. mydate format=mmddyy10.;
  mytext="Hello";
  mydate='12jan14'd;
  output;
  mytext="World";
  mydate='13Nov13'd;
  output;
run;

proc sql;
  create table TMP as
  select  distinct NAME
  from    DICTIONARY.COLUMNS
  where   LIBNAME="WORK"
    and   MEMNAME="HAVE1"
    and   FORMAT="MMDDYY10.";
quit;

data _null_;
  set TMP;
  call execute('data have1;
                  set have1 (rename=('||strip(name)||'=tmp_var));
                  attrib '||strip(name)||' format=ddmmyy10.;
                  '||strip(name)||'=tmp_var;
                run;');
run;

The results of the proc import, where the date informat incorrectly assigns a mmddyy means that anything >=12th of month, is imported into SAS with a . value, so there is no value there to change

cxkev182
Fluorite | Level 6

I have tried mixed=yes, however, results in an error. I have researched this after our suggestion as it was the first time I'd heard of this option. Someone had wrote that it is only compatible with dbms=xls.

data_null__
Jade | Level 19

I wonder if changing the datestyle system option would have any effect?

proc options option=datestyle;
   run;


15         proc options option=datestyle;
16            run;

    SAS (r) Proprietary Software Release
9.3  TS1M2

DATESTYLE=MDY     Identify sequence of month, day and year when ANYDATE informat data is ambiguous
cxkev182
Fluorite | Level 6

data_null_; wrote:

I wonder if changing the datestyle system option would have any effect?

proc options option=datestyle;
   run;


15         proc options option=datestyle;
16            run;

    SAS (r) Proprietary Software Release
9.3  TS1M2

DATESTYLE=MDY     Identify sequence of month, day and year when ANYDATE informat data is ambiguous

Thanks for the suggestion, unfortunately, this has not solved the issue.

Reeza
Super User

Did you change the datestyle option or just run the code data _null_ provided?

proc options option=datestyle;

   run;

options datestyle=dmy;

proc options option=datestyle;

   run;

*Change back;

*options datestyle=mdy;

cxkev182
Fluorite | Level 6

Reeza wrote:

Did you change the datestyle option or just run the code data _null_ provided?

proc options option=datestyle;

   run;

options datestyle=dmy;

proc options option=datestyle;

   run;

*Change back;

*options datestyle=mdy;

Hi Reeza,

I added a options datestyle=dmy; to the start of my program.

Linlin
Lapis Lazuli | Level 10

I would create files with the first two columns using  ‘infile’ statement, then combine these files
with files created by proc import:

Data want;

  merge cvs1_part1  cvs1_part2(drop=column_1 column_2);

  Run;

cxkev182
Fluorite | Level 6

Linlin wrote:

I would create files with the first two columns using  ‘infile’ statement, then combine these files
with files created by proc import:

Data want;

  merge cvs1_part1  cvs1_part2(drop=column_1 column_2);

  Run;

Thanks very much, that worked! Simple when you know how Smiley Happy.

Thanks to everyone for your help.

Peter_C
Rhodochrosite | Level 12

If there were rules indicating the column structure from the file names   

or a consistent informat for each column with known column header 

then you would not be constrained by the "flexibility" offered by proc import. 

you would have ways to pre-determine the informats you need and data steps could be provided with suitable informats and infile  options in a programmatic way.

good luck

ballardw
Super User

If the many csv files are all the same format then it is not a lot of code. You only change the input file reference and the outpu data set name once it is working.

You didn't say how many records your files have, you might try a much larger value than 500 for guessingrows, especially if your data is sorted by the date.

Personally I only use proc import on CSV files to build a program to modify especially if there are more than one input file of the same layout to process. Generally I set text fields such as names, addresses, account identifiers or other to about 25% longer than found in the first file because such things tend to vary from file to file and if I guess right I have all of the data with the same lengths which can be important if the files ever get combined (since they are the same layout the likelihood is pretty large).

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 12 replies
  • 2976 views
  • 7 likes
  • 7 in conversation