DATA Step, Macro, Functions and more

Multiple csv files in one SAS data set instead of multiple data sets

Reply
Occasional Contributor
Posts: 10

Multiple csv files in one SAS data set instead of multiple data sets

Hi All,

 

I have a macro that imports multiple csv files in different SAS data tables. If I wish to have only one large data table instead of different tables, what should I change in my code? I tried a few things but didnt work.  Please advise. 

 

%let path=D:\My_SAS_Projects;
filename folder "&path\Data.";
libname c 'C:\Temp';
options validmemname=extend;

data FilesInFolder;
length Line 8 File $300;
List = dopen('folder');
do Line = 1 to dnum(List);
File = trim(dread(List,Line));
output;
end;
drop list line;
run;

data _NULL_;
set FilesInFolder end=final;
call symput(cats('File', _N_), trim(File));
call symput(cats('Name', _N_), trim(nliteral(substr(File,1,min(32, length(File)-4)))));
if final then call symputx(trim('Total'), _N_);
run;

 

%macro Iteration;
%do i = 1 %to &Total;
proc import datafile="&path\Data\&&File&i"
out= c.&&name&i
dbms=csv
replace;
getnames=yes;
run;
%end;
%mend Iteration;

%Iteration

Super User
Posts: 19,789

Re: Multiple csv files in one SAS data set instead of multiple data sets

Posted in reply to akhileshJoshi
Occasional Contributor
Posts: 10

Re: Multiple csv files in one SAS data set instead of multiple data sets

Hi Reeza,

 

I had already seen this thread. But I am already using the Macro I pasted above in my code. So, wanted to know if the macro itself could be modified instead of writing a new logic.

Super User
Posts: 19,789

Re: Multiple csv files in one SAS data set instead of multiple data sets

Posted in reply to akhileshJoshi

Yes, but you're likely to run into errors with that method. PROC IMPORT is a guessing procedure, it guesses at types. At some point it will guess incorrectly and read a variable in with incorrect type/format. When you try to append you'll generate an error.

 

 

If you want to modify your macro, you need at least two extra steps:

 

1. Add a PROC APPEND after the import and append the data. Note that the base data does not need to pre-exist for first iteration. 

2. Drop the table after you append it so it's not laying around. 

3 (optional). Check if master table already exists and if it does either empty or delete the table. 

Occasional Contributor
Posts: 10

Re: Multiple csv files in one SAS data set instead of multiple data sets

Hi Reeza,

 

What is expected in *Place input code here? I entered a few logic but it says expecting a @. 

 

Thanks and regards,

Akhilesh

 

data import_all;
 
*make sure variables to store file name are long enough;
length filename txt_file_name $256;
 
*keep file name from record to record;
retain txt_file_name;
 
*Use wildcard in input;
infile "Path\*.txt" eov=eov filename=filename truncover;
 
*Input first record and hold line;
input@;
 
*Check if this is the first record or the first record in a new file;
*If it is, replace the filename with the new file name and move to next line;
if _n_ eq 1 or eov then do;
txt_file_name = scan(filename, -1, "\");
eov=0;
end;
 
*Otherwise  go to the import step and read the files;
else input
 
*Place input code here;
 
;
run;

 

Super User
Posts: 19,789

Re: Multiple csv files in one SAS data set instead of multiple data sets

Posted in reply to akhileshJoshi

Post the data step code to read a single file correctly and I'll show you how to map it to the data step solution. 

 

If you don't have the code, run a single proc import and get the code from the log. 

Occasional Contributor
Posts: 10

Re: Multiple csv files in one SAS data set instead of multiple data sets

Hi Reeza,

 

Please find below the data step code to read a single file correctly. Please guide how I can use it to map to the data step solution for multiple csv's.

 

data AK.DB4;
%let _EFIERR_ = 0; /* set the ERROR detection macro variable */
infile 'D:\Raw_Data\RMS_Transposed_Canvass_History\RMS_CompetitorRate_03242017.csv' delimiter = ',' MISSOVER DSD lrecl=32767 firstobs=2;

informat PickupLocationCode $3.;

informat PickupLocationCode $3.;

informat CarClass $1. ;

informat PickupDate anydtdtm40. ;

informat ReturnDate anydtdtm40. ;

informat PricingSegmentId $3. ;

informat ShopDate anydtdtm40. ;

informat LengthOfRent best32. ;


informat PointOfSaleCountryCode $2. ;

informat CategoryCode $1. ;

informat RealTimeCanvasCode $1. ;

informat Rate_Type $4. ;

informat AC $1. ;

informat AD $1. ;

informat AL $11. ;

informat EP $1. ;

informat ET $11. ;

informat EY $1. ;

informat EZ $1. ;

informat FX $1. ;

informat MW $1. ;

informat NC $1. ;

informat NE $1. ;

informat NU $1. ;

informat PC $1. ;

informat PR $1. ;

informat RW $1. ;

informat SC $1. ;

informat SV $1. ;

informat SX $1. ;

informat ZA $1. ;

informat ZD $11. ;

informat ZE $11. ;

informat ZI $11. ;

informat ZL $11. ;

informat ZR $11. ;

informat ZT $11. ;

informat VR $1. ;

informat RE $1. ;

informat MX $1. ;

informat FF $1. ;

informat XX $1. ;

informat WH $1. ;

informat GM $1. ;

informat AX $1. ;

informat SL $1. ;

informat MV $1. ;

informat DS $1. ;

informat GI $1. ;

informat AC_EstTotal $1. ;

informat AD_EstTotal $1. ;

informat AL_EstTotal $11. ;

informat EP_EstTotal $1. ;

informat ET_EstTotal $11. ;

informat EY_EstTotal $1. ;

informat EZ_EstTotal $1. ;

informat FX_EstTotal $1. ;

informat MW_EstTotal $1. ;

informat NC_EstTotal $1. ;

informat NE_EstTotal $1. ;

informat NU_EstTotal $1. ;

informat PC_EstTotal $1. ;

informat PR_EstTotal $1. ;

informat RW_EstTotal $1. ;

informat SC_EstTotal $1. ;

informat SV_EstTotal $1. ;

informat SX_EstTotal $1. ;

informat ZA_EstTotal $1. ;

informat ZD_EstTotal $11. ;

informat ZE_EstTotal $11. ;


informat ZI_EstTotal $11. ;

informat ZL_EstTotal $11. ;

informat ZR_EstTotal $11. ;

informat ZT_EstTotal $11. ;

format PickupLocationCode $3. ;

format ReturnLocationCode $3. ;

format CarClass $1. ;

format PickupDate datetime. ;

format ReturnDate datetime. ;

format PricingSegmentId $3. ;

format ShopDate datetime. ;

format LengthOfRent best12. ;

format PlanCode $1. ;

format PointOfSaleCountryCode $2. ;

format CategoryCode $1. ;

format RealTimeCanvasCode $1. ;

format Rate_Type $4. ;

format AC $1. ;

format AD $1. ;

format AL $11. ;

format EP $1. ;

format ET $11. ;

format EY $1. ;

format EZ $1. ;

format FX $1. ;

format MW $1. ;

format NC $1. ;

format NE $1. ;

format NU $1. ;

format PC $1. ;

format PR $1. ;

format RW $1. ;

format SC $1. ;

format SV $1. ;

format SX $1. ;

format ZA $1. ;

format ZD $11. ;

format ZE $11. ;

format ZI $11. ;

format ZL $11. ;

format ZR $11. ;

format ZT $11. ;

format VR $1. ;

format RE $1. ;

format MX $1. ;

format FF $1. ;

format XX $1. ;

format WH $1. ;

format GM $1. ;

format AX $1. ;

format SL $1. ;

format MV $1. ;

format DS $1. ;

format GI $1. ;

format AC_EstTotal $1. ;

format AD_EstTotal $1. ;

format AL_EstTotal $11. ;

format EP_EstTotal $1. ;

format ET_EstTotal $11. ;

format EY_EstTotal $1. ;

format EZ_EstTotal $1. ;

format FX_EstTotal $1. ;

format MW_EstTotal $1. ;

format NC_EstTotal $1. ;

format NE_EstTotal $1. ;

format NU_EstTotal $1. ;

format PC_EstTotal $1. ;

format PR_EstTotal $1. ;

format RW_EstTotal $1. ;

format SC_EstTotal $1. ;

format SV_EstTotal $1. ;

format SX_EstTotal $1. ;

format ZA_EstTotal $1. ;

format ZD_EstTotal $11. ;

format ZE_EstTotal $11. ;

format ZI_EstTotal $11. ;

format ZL_EstTotal $11. ;

format ZR_EstTotal $11. ;

format ZT_EstTotal $11. ;

input

PickupLocationCode $

ReturnLocationCode $

CarClass $

PickupDate

ReturnDate

PricingSegmentId $

ShopDate

LengthOfRent

PlanCode $

PointOfSaleCountryCode $

CategoryCode $

RealTimeCanvasCode $

Rate_Type $

AC $

AD $

AL $

EP $

ET $

EY $

EZ $

FX $

MW $

NC $

NE $

NU $

PC $

PR $

RW $

SC $

SV $

SX $

ZA $

ZD $

ZI $

ZL $

ZR $

ZT $

VR $

RE $

MX $

FF $

XX $

WH $

GM $

AX $

SL $

MV $

DS $

GI $

AC_EstTotal $

AD_EstTotal $

AL_EstTotal $

EP_EstTotal $

ET_EstTotal $

EY_EstTotal $

EZ_EstTotal $

FX_EstTotal $

MW_EstTotal $

NC_EstTotal $

NE_EstTotal $

NU_EstTotal $

PC_EstTotal $

PR_EstTotal $

RW_EstTotal $

SC_EstTotal $

SV_EstTotal $

SX_EstTotal $

ZA_EstTotal $

ZD_EstTotal $

ZE_EstTotal $

ZI_EstTotal $

ZL_EstTotal $

ZR_EstTotal $

ZT_EstTotal $;

if _ERROR_ then call symputx('_EFIERR_',1); /* set ERROR detection macro variable */

run;

 

Super User
Posts: 11,343

Re: Multiple csv files in one SAS data set instead of multiple data sets

Posted in reply to akhileshJoshi

You may want to go back to what ever file description you have for what your data should contain. You have a large number of your variables with and Informat of $1. That often means that the field was actually missing any values. When I see variables whose names end in Total I kind of expect them to be numeric as we seldom sum or "total" character values.

 

When I see this:

informat AC $1. ;
informat AD $1. ;
informat AL $11. ;
informat EP $1. ;
informat ET $11. ;

informat AC_EstTotal $1. ;
informat AD_EstTotal $1. ;
informat AL_EstTotal $11. ;
informat EP_EstTotal $1. ;
informat ET_EstTotal $11. ;

I would expect that AC AD AL EP and ET are numeric and then you have totals. Would it be a far stretch to say that your orginal data file has many currency values such a $12,456.89 ? If you expect to do arithmetic later (ie SUM) with them then you will want to address that issue.

 

 

The main purpose of this part of the exercise is to get a file definition that will work for all of your expected data so you want to make sure that the informats are correct for your values. When you do get another file values, AC for example, then this code would truncate the value to the first character. Which if my guess is correct would be "$" or your currency symbol. So as a minimum you likely want all of those AC, AD etc variables to have a wider informat to read the longest expected value.

 

The conversion to the code listed in @Reeza's link is very simple.

The link includes:

 
*Use wildcard in input;
infile "Path\*.txt" eov=eov filename=filename truncover;
 

so using your example code:

 

infile 'D:\Raw_Data\RMS_Transposed_Canvass_History\*.csv' delimiter = ',' MISSOVER DSD lrecl=32767 firstobs=2
    eov=eov filename=filename truncover;

 

Then just before the INPUT statement in the code you show add the

*Input first record and hold line;
input@;
 
*Check if this is the first record or the first record in a new file;
*If it is, replace the filename with the new file name and move to next line;
if _n_ eq 1 or eov then do;
txt_file_name = scan(filename, -1, "\");
eov=0;
end;
 
*Otherwise  go to the import step and read the files;
else input  /*<= this is where the generated INPUT you show in your code goes*/

 

Make sure that all of the INFORMAT statements appear before the line with INPUT @;

 

Please post example code using the forum menu icon {i}. The forum otherwise tends to reformat code somewhat making it hard to read and inserting line breaks and such. This reformatting will sometimes result in code that does not run, always makes the code hard to read and just plain takes up too much screen space.

Ask a Question
Discussion stats
  • 7 replies
  • 157 views
  • 0 likes
  • 3 in conversation