SAS Programming

SamVirgil96 · Posted 11-11-2020 10:30 PM

Hi,

I'm trying to use data step to import & merge many similarly structured CSV files into one, where their first few lines are like:

Title

Retrieved on xxx

.

Var1,Var2,Var3

1,2,3...

Apparently the actual data begins at row 5, so I tried first:

data Want;infile '&path:\*.csv' dlm=',' missover dsd firstobs=5;input var1 var2 var3;run;

But the outcome turned out to be that the entering four rows were skipped for the first file, but not for the rest. Namely I had:

Var1,Var2,Var3

1,2,3

Title,.,.

Retrieved on xxx,.,.

.,.,.

4,5,6...

I then tried the infile option evo= to skip lines for every files:

data Want;infile '&path:\*.csv' dlm=',' missover dsd evo=evo firstobs=5;input @;if evo then input;input var1 var2 var3;evo=0;run;

But it didn't work. I have concrete confidence that the above would work for files with row 1 being its header and data beginning at row 2 when you set firstobs=2; However it seemed the same would not work for my scenario. Any help would be appreciated!

mkeintz · Posted 11-11-2020 11:54 PM

Drop the "firstobs=" approach.

Instead you can use the FILENAME= option of the infile statement. Comparing the current value of the filename= variable to its lagged value will let you know when you're starting a new csv file. Then, whenever that is the case, skip 4 records using "input #4".

data want;
  length fnam $30;
  infile 'c:\temp\*.csv' dlm=',' filename=fnam;
  input @;                          /* Set pointer to current record */
  if fnam^=lag(fnam) then input #4; /* New file?, then skip 4 records */
  input v1 v2 ;
run;

In your example, you have

infile '&path:\*.csv' ...

which would not work, since the macro processor would not resolve the macrovar &path, because it is in single quotes. And I guess since you are appending a ":" to &path, that macrovar path is nothing more than a drive letter.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

SamVirgil96 · Posted 11-14-2020 01:15 AM

For some reason it's not working on my end. My environment is SAS Enterprise Guide 7.1 running on a Window 10. In fact, the effect of your code is exactly the same as running:

data want;
infile 'c:\temp\*.csv' dlm=',';
input v1 v2 ;
run;

So I guess it's the examining the current file name with the lagged file name part that is not working. Thanks for your reply anyway.

ChrisNZ · Posted 11-12-2020 12:10 AM

It seems to work for me. Simplify your problem until you can pinpoint the issue.

data _null_;
  file "&wdir\file1.txt";
  put 'Title';
  put 'Retrieved on xxx ';
  put '.               ';
  put 'Var1,Var2,Var3  ';
  put '1,2,3       ';
  file "&wdir\file2.txt";
  put 'Title';
  put 'Retrieved on xxx ';
  put '.               ';
  put 'Var1,Var2,Var3  ';
  put '4,5,6          ';
run;      

data _null_;
  do I=1,2;
    FILEVAR=cats("&wdir\file",I,".txt");
    infile DUMMY filevar=FILEVAR firstobs=5;
    input ;
    putlog _infile_;
  end;
  stop;
run;

1,2,3

4,5,6

High-Performance SAS Coding - Third Edition

SamVirgil96 · Posted 11-14-2020 02:07 AM

Hi! Your solution seems to work, but why is it only taking the first row of each file? The result is like:

filename Var1 Var2 Var3

file1 file1(1,1) file1(1,2) file1(1,3)

file2 file2(1,1) file2(1,2) file2(1,3)

file3 file3(1,1) file3(1,2) file3(1,3)

I hope you can get it, and thank you for your time.

andreas_lds · Posted 11-12-2020 01:08 AM

This is an enhanced version of the solution posted by @ChrisNZ automatically processing all files of interest. Variables most likely not relevant for the created dataset are prefixed with an underscore.

data want;
   length 
      v1-v3 8
      _filename $ 200
   ;
   keep v1-v3;
   
   _rc = filename("mydir", "YOUR_PATH"); 
   _did = dopen("mydir"); 
   
   do _i = 1 to dnum(_did);
      _filename = dread(_did, _i);
      if prxmatch("/.*\.txt/i", trim(_filename)) then do;
         _fileVar = cats("YOUR_PATH\", _filename);
         
         infile dummy delimiter=',' filevar= _fileVar firstobs= 5;
         input v1--v3;
         output;
      end;
   end;
   
   _rc = dclose(_did);
   stop;
run;

SamVirgil96 · Posted 11-14-2020 01:19 AM

In an effort to be more specific about my question, the following are the data sets I use:

SAS Programming

How to import multiple CSV files while skipping their first X rows?

Re: How to import multiple CSV files while skipping their first X rows?

Re: How to import multiple CSV files while skipping their first X rows?

Re: How to import multiple CSV files while skipping their first X rows?

Re: How to import multiple CSV files while skipping their first X rows?

Re: How to import multiple CSV files while skipping their first X rows?

Re: How to import multiple CSV files while skipping their first X rows?

Follow Us

What is...

SAS Programming

How to import multiple CSV files while skipping their first X rows?

Re: How to import multiple CSV files while skipping their first X rows?

Re: How to import multiple CSV files while skipping their first X rows?

Re: How to import multiple CSV files while skipping their first X rows?

Re: How to import multiple CSV files while skipping their first X rows?

Re: How to import multiple CSV files while skipping their first X rows?

Re: How to import multiple CSV files while skipping their first X rows?

Special offer for SAS Communities members

SAS Training: Just a Click Away

Follow Us

What is...