SAS Programming

DATA Step, Macro, Functions and more
BookmarkSubscribeRSS Feed
SamVirgil96
Fluorite | Level 6

Hi,

 

I'm trying to use data step to import & merge many similarly structured CSV files into one, where their first few lines are like:

 

Title

Retrieved on xxx

.

Var1,Var2,Var3

1,2,3...


Apparently the actual data begins at row 5, so I tried first:

 

data Want;infile '&path:\*.csv' dlm=',' missover dsd firstobs=5;input var1 var2 var3;run; 	

 

But the outcome turned out to be that the entering four rows were skipped for the first file, but not for the rest. Namely I had:

 

Var1,Var2,Var3

1,2,3

Title,.,.

Retrieved on xxx,.,.

.,.,.

4,5,6...

 

I then tried the infile option evo= to skip lines for every files:

data Want;infile '&path:\*.csv' dlm=',' missover dsd evo=evo firstobs=5;input @;if evo then input;input var1 var2 var3;evo=0;run; 

But it didn't work. I have concrete confidence that the above would work for files with row 1 being its header and data beginning at row 2 when you set firstobs=2; However it seemed the same would not work for my scenario. Any help would be appreciated!

6 REPLIES 6
mkeintz
PROC Star

Drop the "firstobs=" approach.

 

Instead you can use the FILENAME= option of the infile statement.  Comparing the current value of the filename= variable to its lagged value will let you know when you're starting a new csv file.   Then, whenever that is the case, skip 4 records using "input #4".

 

data want;
  length fnam $30;
  infile 'c:\temp\*.csv' dlm=',' filename=fnam;
  input @;                          /* Set pointer to current record */
  if fnam^=lag(fnam) then input #4; /* New file?, then skip 4 records */
  input v1 v2 ;
run;

In your example, you have

   infile '&path:\*.csv' ...

which would not work, since the macro processor would not resolve the macrovar &path, because it is in single quotes.  And I guess since you are appending a ":" to &path, that macrovar path is nothing more than a drive letter.  

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
SamVirgil96
Fluorite | Level 6

For some reason it's not working on my end. My environment is SAS Enterprise Guide 7.1 running on a Window 10. In fact, the effect of your code is exactly the same as running:

data want;
 
infile 'c:\temp\*.csv' dlm=',';
 
input v1 v2 ;
run;

So I guess it's the examining the current file name with the lagged file name part that is not working. Thanks for your reply anyway.

ChrisNZ
Tourmaline | Level 20

It seems to work for me. Simplify your problem until you can pinpoint the issue.

data _null_;
  file "&wdir\file1.txt";
  put 'Title';
  put 'Retrieved on xxx ';
  put '.               ';
  put 'Var1,Var2,Var3  ';
  put '1,2,3       ';
  file "&wdir\file2.txt";
  put 'Title';
  put 'Retrieved on xxx ';
  put '.               ';
  put 'Var1,Var2,Var3  ';
  put '4,5,6          ';
run;      

data _null_;
  do I=1,2;
    FILEVAR=cats("&wdir\file",I,".txt");
    infile DUMMY filevar=FILEVAR firstobs=5;
    input ;
    putlog _infile_;
  end;
  stop;
run;

1,2,3

4,5,6

 

SamVirgil96
Fluorite | Level 6

Hi! Your solution seems to work, but why is it only taking the first row of each file? The result is like:

 

filename Var1         Var2        Var3

file1        file1(1,1)  file1(1,2)  file1(1,3)

file2        file2(1,1)  file2(1,2)  file2(1,3)

file3        file3(1,1)  file3(1,2)  file3(1,3)

 

I hope you can get it, and thank you for your time.

andreas_lds
Jade | Level 19

This is an enhanced version of the solution posted by @ChrisNZ automatically processing all files of interest. Variables most likely not relevant for the created dataset are prefixed with an underscore.

data want;
   length 
      v1-v3 8
      _filename $ 200
   ;
   keep v1-v3;
   
   _rc = filename("mydir", "YOUR_PATH"); 
   _did = dopen("mydir"); 
   
   do _i = 1 to dnum(_did);
      _filename = dread(_did, _i);
      if prxmatch("/.*\.txt/i", trim(_filename)) then do;
         _fileVar = cats("YOUR_PATH\", _filename);
         
         infile dummy delimiter=',' filevar= _fileVar firstobs= 5;
         input v1--v3;
         output;
      end;
   end;
   
   _rc = dclose(_did);
   stop;
run;

sas-innovate-white.png

Special offer for SAS Communities members

Save $250 on SAS Innovate and get a free advance copy of the new SAS For Dummies book! Use the code "SASforDummies" to register. Don't miss out, May 6-9, in Orlando, Florida.

 

View the full agenda.

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 2749 views
  • 3 likes
  • 4 in conversation