BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
jerry898969
Pyrite | Level 9

Hello,

 

I'm tasked with creating a process that will import many csv files with state addresses.  For example the Iowa directory has 4 csv files. 

All the tables have the same columns, but different data types and length.

 

I created a macro to import the csv files, but now when I try and stack them I get the types doesn't match error.

 

I created a empty table and using char(32767) so it will fit all rows, but now the table is very large.

 

What would be the best approach for this situation?

 

Thank you

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

The INFILE option FILENAME will capture the input file information. Since the automatic variable will not be saved you need to add an instruction to do that explicitly and since the variable will be character you should specify a value long enough to store the expected results. A skelton modifying the previous post would look like:

data want;
   length FileKeep FileTemp $ 200;
   infile "path\*.CSV" lrecl=32767 
      firstobs=2 dsd eov=skip missover
      Filename=FileTemp;
   /* informat and format statements go
      here when I write code
   */
   input @;
   FileKeep = FileTemp;
   if SKIP then SKIP=0;
   Else do;
    /* input and any manipulation code goes here*/
   end;
run;

The length needs to be set Before the variable is created else you only capture 8 characters by default. I use 200 as an example. If you KNOW that the length of your file path plus name will be less or more then adjust.

 

Note the assignment FileKeep= FileTemp is AFTER and input statement. Until input executes the temporary variable isn't set. And is the case will all these infile variables the Filetemp will not be in the output.

View solution in original post

7 REPLIES 7
Reeza
Super User

Fix it upstream. Fix your import so the types and lengths match. 

Basically read the file in with a data step rather than proc import. 

ballardw
Super User

In data terms this statement:

All the tables have the same columns, but different data types and length.

is null content. If they have the "same columns" then they should have the same data type.

 

Or if your "different" data types mean that one has stuff that looks like numbers but isn't actually such as Zipcodes, account numbers or product id's then read them all as text. Set the length to the largest you expect (and I usually add 5 to 10 more characters) just in case.

 

Then read ALL of the files with the same program. If the object is to read many files into a single data set then you can use some options in the data step code.

The option EOV can get you through this quite easily. Here is a skeleton of code to demostrate.

data want;
   infile "path\*.CSV" lrecl=32767 
      firstobs=2 dsd eov=skip missover;
   /* informat and format statements go
      here when I write code
   */
input @; if SKIP then SKIP=0; Else do; /* input and any manipulation code goes here*/ end; run;

I am assuming that your data has a header row, the firstobs=2 skips that. The first input holds the line to see if it is the first of a new file. The option EOV sets the variable named SKIP to 1 if the first line of the next file. If so reset the value of the varaible and go to next line else read using your actual input statement.

 

Note that the variable SKIP will not be written into the output set.

The variable SKIP is created by

jerry898969
Pyrite | Level 9
Thank you both for your replies.
The address for example has address number. On some of the files all the numbers are numeric ex. 10011 as the address number. Some other file has that column as character because there is an address number that is 322W10.

I'm using a macro to look through the list of csv files and then saving them into a table after the data is imported. Is that what I should do but use a data step to bring the data in?
ballardw
Super User

ADDRESS "numbers" are not numbers. If you are not going to do math with them, such as SUM or MEAN, set them as character.

And yes read them with a data step as that is basically the only way you are going to avoid the mismatched type and length issue.

 

When I read address data if I am lucky enough to have a separate "number" field I set it to be $25 because I know I'm going to get stuff like: Rural Route 23 or "Trailer Space #35".

 

And the example code I provided with suitable INFormat and INPUT statements will read ALL of the CSV files in the infile at one time.

jerry898969
Pyrite | Level 9

Thank you so much for all your help ballardw.

Is there a way to add the file name to each row or is there a way to know which row came from which csv file?

 

I'm going to update my code with your suggestions. I will update back on my progress.

ballardw
Super User

The INFILE option FILENAME will capture the input file information. Since the automatic variable will not be saved you need to add an instruction to do that explicitly and since the variable will be character you should specify a value long enough to store the expected results. A skelton modifying the previous post would look like:

data want;
   length FileKeep FileTemp $ 200;
   infile "path\*.CSV" lrecl=32767 
      firstobs=2 dsd eov=skip missover
      Filename=FileTemp;
   /* informat and format statements go
      here when I write code
   */
   input @;
   FileKeep = FileTemp;
   if SKIP then SKIP=0;
   Else do;
    /* input and any manipulation code goes here*/
   end;
run;

The length needs to be set Before the variable is created else you only capture 8 characters by default. I use 200 as an example. If you KNOW that the length of your file path plus name will be less or more then adjust.

 

Note the assignment FileKeep= FileTemp is AFTER and input statement. Until input executes the temporary variable isn't set. And is the case will all these infile variables the Filetemp will not be in the output.

jerry898969
Pyrite | Level 9

Thank you so much for your help. 

 

That code has stacked the csv files and they are all the same type and length and it's much faster then proc import.

 

I appreciate all your help.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 2324 views
  • 0 likes
  • 3 in conversation