DATA Step, Macro, Functions and more

Stacking multiple tables with same columns but different types and sizes

Accepted Solution Solved
Reply
Super Contributor
Posts: 400
Accepted Solution

Stacking multiple tables with same columns but different types and sizes

Hello,

 

I'm tasked with creating a process that will import many csv files with state addresses.  For example the Iowa directory has 4 csv files. 

All the tables have the same columns, but different data types and length.

 

I created a macro to import the csv files, but now when I try and stack them I get the types doesn't match error.

 

I created a empty table and using char(32767) so it will fit all rows, but now the table is very large.

 

What would be the best approach for this situation?

 

Thank you


Accepted Solutions
Solution
‎10-06-2016 03:01 PM
Super User
Posts: 11,343

Re: Stacking multiple tables with same columns but different types and sizes

Posted in reply to jerry898969

The INFILE option FILENAME will capture the input file information. Since the automatic variable will not be saved you need to add an instruction to do that explicitly and since the variable will be character you should specify a value long enough to store the expected results. A skelton modifying the previous post would look like:

data want;
   length FileKeep FileTemp $ 200;
   infile "path\*.CSV" lrecl=32767 
      firstobs=2 dsd eov=skip missover
      Filename=FileTemp;
   /* informat and format statements go
      here when I write code
   */
   input @;
   FileKeep = FileTemp;
   if SKIP then SKIP=0;
   Else do;
    /* input and any manipulation code goes here*/
   end;
run;

The length needs to be set Before the variable is created else you only capture 8 characters by default. I use 200 as an example. If you KNOW that the length of your file path plus name will be less or more then adjust.

 

Note the assignment FileKeep= FileTemp is AFTER and input statement. Until input executes the temporary variable isn't set. And is the case will all these infile variables the Filetemp will not be in the output.

View solution in original post


All Replies
Super User
Posts: 19,770

Re: Stacking multiple tables with same columns but different types and sizes

Posted in reply to jerry898969

Fix it upstream. Fix your import so the types and lengths match. 

Basically read the file in with a data step rather than proc import. 

Super User
Posts: 11,343

Re: Stacking multiple tables with same columns but different types and sizes

Posted in reply to jerry898969

In data terms this statement:

All the tables have the same columns, but different data types and length.

is null content. If they have the "same columns" then they should have the same data type.

 

Or if your "different" data types mean that one has stuff that looks like numbers but isn't actually such as Zipcodes, account numbers or product id's then read them all as text. Set the length to the largest you expect (and I usually add 5 to 10 more characters) just in case.

 

Then read ALL of the files with the same program. If the object is to read many files into a single data set then you can use some options in the data step code.

The option EOV can get you through this quite easily. Here is a skeleton of code to demostrate.

data want;
   infile "path\*.CSV" lrecl=32767 
      firstobs=2 dsd eov=skip missover;
   /* informat and format statements go
      here when I write code
   */
input @; if SKIP then SKIP=0; Else do; /* input and any manipulation code goes here*/ end; run;

I am assuming that your data has a header row, the firstobs=2 skips that. The first input holds the line to see if it is the first of a new file. The option EOV sets the variable named SKIP to 1 if the first line of the next file. If so reset the value of the varaible and go to next line else read using your actual input statement.

 

Note that the variable SKIP will not be written into the output set.

The variable SKIP is created by

Super Contributor
Posts: 400

Re: Stacking multiple tables with same columns but different types and sizes

Thank you both for your replies.
The address for example has address number. On some of the files all the numbers are numeric ex. 10011 as the address number. Some other file has that column as character because there is an address number that is 322W10.

I'm using a macro to look through the list of csv files and then saving them into a table after the data is imported. Is that what I should do but use a data step to bring the data in?
Super User
Posts: 11,343

Re: Stacking multiple tables with same columns but different types and sizes

Posted in reply to jerry898969

ADDRESS "numbers" are not numbers. If you are not going to do math with them, such as SUM or MEAN, set them as character.

And yes read them with a data step as that is basically the only way you are going to avoid the mismatched type and length issue.

 

When I read address data if I am lucky enough to have a separate "number" field I set it to be $25 because I know I'm going to get stuff like: Rural Route 23 or "Trailer Space #35".

 

And the example code I provided with suitable INFormat and INPUT statements will read ALL of the CSV files in the infile at one time.

Super Contributor
Posts: 400

Re: Stacking multiple tables with same columns but different types and sizes

[ Edited ]

Thank you so much for all your help ballardw.

Is there a way to add the file name to each row or is there a way to know which row came from which csv file?

 

I'm going to update my code with your suggestions. I will update back on my progress.

Solution
‎10-06-2016 03:01 PM
Super User
Posts: 11,343

Re: Stacking multiple tables with same columns but different types and sizes

Posted in reply to jerry898969

The INFILE option FILENAME will capture the input file information. Since the automatic variable will not be saved you need to add an instruction to do that explicitly and since the variable will be character you should specify a value long enough to store the expected results. A skelton modifying the previous post would look like:

data want;
   length FileKeep FileTemp $ 200;
   infile "path\*.CSV" lrecl=32767 
      firstobs=2 dsd eov=skip missover
      Filename=FileTemp;
   /* informat and format statements go
      here when I write code
   */
   input @;
   FileKeep = FileTemp;
   if SKIP then SKIP=0;
   Else do;
    /* input and any manipulation code goes here*/
   end;
run;

The length needs to be set Before the variable is created else you only capture 8 characters by default. I use 200 as an example. If you KNOW that the length of your file path plus name will be less or more then adjust.

 

Note the assignment FileKeep= FileTemp is AFTER and input statement. Until input executes the temporary variable isn't set. And is the case will all these infile variables the Filetemp will not be in the output.

Super Contributor
Posts: 400

Re: Stacking multiple tables with same columns but different types and sizes

Thank you so much for your help. 

 

That code has stacked the csv files and they are all the same type and length and it's much faster then proc import.

 

I appreciate all your help.

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 7 replies
  • 422 views
  • 0 likes
  • 3 in conversation