BookmarkSubscribeRSS Feed
SamVirgil96
Fluorite | Level 6

Hi,

 

I'm trying to use data step to import & merge many similarly structured CSV files into one, where their first few lines are like:

 

Title

Retrieved on xxx

.

Var1,Var2,Var3

1,2,3...


Apparently the actual data begins at row 5, so I tried first:

 

data Want;
infile '&path:\*.csv' dlm=',' missover dsd firstobs=5;
input var1 var2 var3;
run; 

 

But the outcome turned out to be that the entering four rows were skipped for the first file, but not for the rest. Namely I had:

 

Var1,Var2,Var3

1,2,3

Title,.,.

Retrieved on xxx,.,.

.,.,.

4,5,6...

 

I then tried the infile option evo= to skip lines for every files:

data Want;
infile '&path:\*.csv' dlm=',' missover dsd evo=evo firstobs=5;
input @;
if evo then input;
input var1 var2 var3;
evo=0;
run; 

But it didn't work. I have concrete confidence that the above would work for files with row 1 being its header and data beginning at row 2 when you set firstobs=2; However it seemed the same would not work for my scenario. Any help would be appreciated!

6 REPLIES 6
mkeintz
PROC Star

Drop the "firstobs=" approach.

 

Instead you can use the FILENAME= option of the infile statement.  Comparing the current value of the filename= variable to its lagged value will let you know when you're starting a new csv file.   Then, whenever that is the case, skip 4 records using "input #4".

 

data want;
  length fnam $30;
  infile 'c:\temp\*.csv' dlm=',' filename=fnam;
  input @;                          /* Set pointer to current record */
  if fnam^=lag(fnam) then input #4; /* New file?, then skip 4 records */
  input v1 v2 ;
run;

In your example, you have

   infile '&path:\*.csv' ...

which would not work, since the macro processor would not resolve the macrovar &path, because it is in single quotes.  And I guess since you are appending a ":" to &path, that macrovar path is nothing more than a drive letter.  

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
SamVirgil96
Fluorite | Level 6

For some reason it's not working on my end. My environment is SAS Enterprise Guide 7.1 running on a Window 10. In fact, the effect of your code is exactly the same as running:

data want;
 
infile 'c:\temp\*.csv' dlm=',';
 
input v1 v2 ;
run;

So I guess it's the examining the current file name with the lagged file name part that is not working. Thanks for your reply anyway.

ChrisNZ
Tourmaline | Level 20

It seems to work for me. Simplify your problem until you can pinpoint the issue.

data _null_;
  file "&wdir\file1.txt";
  put 'Title';
  put 'Retrieved on xxx ';
  put '.               ';
  put 'Var1,Var2,Var3  ';
  put '1,2,3       ';
  file "&wdir\file2.txt";
  put 'Title';
  put 'Retrieved on xxx ';
  put '.               ';
  put 'Var1,Var2,Var3  ';
  put '4,5,6          ';
run;      

data _null_;
  do I=1,2;
    FILEVAR=cats("&wdir\file",I,".txt");
    infile DUMMY filevar=FILEVAR firstobs=5;
    input ;
    putlog _infile_;
  end;
  stop;
run;

1,2,3

4,5,6

 

SamVirgil96
Fluorite | Level 6

Hi! Your solution seems to work, but why is it only taking the first row of each file? The result is like:

 

filename Var1         Var2        Var3

file1        file1(1,1)  file1(1,2)  file1(1,3)

file2        file2(1,1)  file2(1,2)  file2(1,3)

file3        file3(1,1)  file3(1,2)  file3(1,3)

 

I hope you can get it, and thank you for your time.

andreas_lds
Jade | Level 19

This is an enhanced version of the solution posted by @ChrisNZ automatically processing all files of interest. Variables most likely not relevant for the created dataset are prefixed with an underscore.

data want;
   length 
      v1-v3 8
      _filename $ 200
   ;
   keep v1-v3;
   
   _rc = filename("mydir", "YOUR_PATH"); 
   _did = dopen("mydir"); 
   
   do _i = 1 to dnum(_did);
      _filename = dread(_did, _i);
      if prxmatch("/.*\.txt/i", trim(_filename)) then do;
         _fileVar = cats("YOUR_PATH\", _filename);
         
         infile dummy delimiter=',' filevar= _fileVar firstobs= 5;
         input v1--v3;
         output;
      end;
   end;
   
   _rc = dclose(_did);
   stop;
run;
SamVirgil96
Fluorite | Level 6

In an effort to be more specific about my question, the following are the data sets I use:

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 3594 views
  • 3 likes
  • 4 in conversation