Solved: Re: Stacking multiple tables with same columns but different types and...

jerry898969 · Posted 10-06-2016 01:32 PM

Hello,

I'm tasked with creating a process that will import many csv files with state addresses. For example the Iowa directory has 4 csv files.

All the tables have the same columns, but different data types and length.

I created a macro to import the csv files, but now when I try and stack them I get the types doesn't match error.

I created a empty table and using char(32767) so it will fit all rows, but now the table is very large.

What would be the best approach for this situation?

Thank you

ballardw · Posted 10-06-2016 02:52 PM

The INFILE option FILENAME will capture the input file information. Since the automatic variable will not be saved you need to add an instruction to do that explicitly and since the variable will be character you should specify a value long enough to store the expected results. A skelton modifying the previous post would look like:

data want;
   length FileKeep FileTemp $ 200;
   infile "path\*.CSV" lrecl=32767 
      firstobs=2 dsd eov=skip missover
      Filename=FileTemp;
   /* informat and format statements go
      here when I write code
   */
   input @;
   FileKeep = FileTemp;
   if SKIP then SKIP=0;
   Else do;
    /* input and any manipulation code goes here*/
   end;
run;

The length needs to be set Before the variable is created else you only capture 8 characters by default. I use 200 as an example. If you KNOW that the length of your file path plus name will be less or more then adjust.

Note the assignment FileKeep= FileTemp is AFTER and input statement. Until input executes the temporary variable isn't set. And is the case will all these infile variables the Filetemp will not be in the output.

View solution in original post

Reeza · Posted 10-06-2016 01:48 PM

Fix it upstream. Fix your import so the types and lengths match.

Basically read the file in with a data step rather than proc import.

ballardw · Posted 10-06-2016 01:54 PM

In data terms this statement:

All the tables have the same columns, but different data types and length.

is null content. If they have the "same columns" then they should have the same data type.

Or if your "different" data types mean that one has stuff that looks like numbers but isn't actually such as Zipcodes, account numbers or product id's then read them all as text. Set the length to the largest you expect (and I usually add 5 to 10 more characters) just in case.

Then read ALL of the files with the same program. If the object is to read many files into a single data set then you can use some options in the data step code.

The option EOV can get you through this quite easily. Here is a skeleton of code to demostrate.

data want;
   infile "path\*.CSV" lrecl=32767 
      firstobs=2 dsd eov=skip missover;
   /* informat and format statements go
      here when I write code
   */
   input @;
   if SKIP then SKIP=0;
   Else do;
    /* input and any manipulation code goes here*/
   end;
run;

I am assuming that your data has a header row, the firstobs=2 skips that. The first input holds the line to see if it is the first of a new file. The option EOV sets the variable named SKIP to 1 if the first line of the next file. If so reset the value of the varaible and go to next line else read using your actual input statement.

Note that the variable SKIP will not be written into the output set.

The variable SKIP is created by

jerry898969 · Posted 10-06-2016 02:09 PM

Thank you both for your replies.
The address for example has address number. On some of the files all the numbers are numeric ex. 10011 as the address number. Some other file has that column as character because there is an address number that is 322W10.

I'm using a macro to look through the list of csv files and then saving them into a table after the data is imported. Is that what I should do but use a data step to bring the data in?

ballardw · Posted 10-06-2016 02:32 PM

ADDRESS "numbers" are not numbers. If you are not going to do math with them, such as SUM or MEAN, set them as character.

And yes read them with a data step as that is basically the only way you are going to avoid the mismatched type and length issue.

When I read address data if I am lucky enough to have a separate "number" field I set it to be $25 because I know I'm going to get stuff like: Rural Route 23 or "Trailer Space #35".

And the example code I provided with suitable INFormat and INPUT statements will read ALL of the CSV files in the infile at one time.

jerry898969 · Posted 10-06-2016 02:35 PM

Thank you so much for all your help ballardw.

Is there a way to add the file name to each row or is there a way to know which row came from which csv file?

I'm going to update my code with your suggestions. I will update back on my progress.

ballardw · Posted 10-06-2016 02:52 PM

The INFILE option FILENAME will capture the input file information. Since the automatic variable will not be saved you need to add an instruction to do that explicitly and since the variable will be character you should specify a value long enough to store the expected results. A skelton modifying the previous post would look like:

data want;
   length FileKeep FileTemp $ 200;
   infile "path\*.CSV" lrecl=32767 
      firstobs=2 dsd eov=skip missover
      Filename=FileTemp;
   /* informat and format statements go
      here when I write code
   */
   input @;
   FileKeep = FileTemp;
   if SKIP then SKIP=0;
   Else do;
    /* input and any manipulation code goes here*/
   end;
run;

The length needs to be set Before the variable is created else you only capture 8 characters by default. I use 200 as an example. If you KNOW that the length of your file path plus name will be less or more then adjust.

Note the assignment FileKeep= FileTemp is AFTER and input statement. Until input executes the temporary variable isn't set. And is the case will all these infile variables the Filetemp will not be in the output.

jerry898969 · Posted 10-06-2016 03:01 PM

Thank you so much for your help.

That code has stacked the csv files and they are all the same type and length and it's much faster then proc import.

I appreciate all your help.

Stacking multiple tables with same columns but different types and sizes

Re: Stacking multiple tables with same columns but different types and sizes

Re: Stacking multiple tables with same columns but different types and sizes

Re: Stacking multiple tables with same columns but different types and sizes

Re: Stacking multiple tables with same columns but different types and sizes

Re: Stacking multiple tables with same columns but different types and sizes

Re: Stacking multiple tables with same columns but different types and sizes

Re: Stacking multiple tables with same columns but different types and sizes

Re: Stacking multiple tables with same columns but different types and sizes

SAS Innovate 2025: Call for Content

Classroom Training Available!