BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
shellp55
Quartz | Level 8

Hello

 

From a SAS tutorial I found code to import multiple .csv files at the same time and it works well (code is below).  I will admit that I don't fully understand the code but I am wondering what code I would use to:

1.  firstobs = 6 instead of the 2 that it is currently set at (I don't see in the code where this setting is)

2.  each file name is the same format i.e. facility_trend_888.csv, facility_trend_999.csv etc. and I would like to create a column in the dataset of "facility" to equal the last 3 digits i.e. 888 for EACH row of data in the dataset.

3.  Is there a way to exclude empty rows upon import or do I just do that after the fact?

4.  Is there code to then combine all datasets into one dataset?

 

Thanks in advance for any and all assistance. 

 

The code below works as it should by importing the datasets with each file being imported with the naming convention of dsn1, dsn2 etc. to equal the number of csv files in the identified folder.

 

%macro drive(dir,ext); 
   %local cnt filrf rc did memcnt name; 
   %let cnt=0;          

   %let filrf=mydir;    
   %let rc=%sysfunc(filename(filrf,&dir)); 
   %let did=%sysfunc(dopen(&filrf));
    %if &did ne 0 %then %do;   
   %let memcnt=%sysfunc(dnum(&did));    

    %do i=1 %to &memcnt;              
                       
      %let name=%qscan(%qsysfunc(dread(&did,&i)),-1,.);                    
                    
      %if %qupcase(%qsysfunc(dread(&did,&i))) ne %qupcase(&name) %then %do;
       %if %superq(ext) = %superq(name) %then %do;                         
          %let cnt=%eval(&cnt+1);       
          %put %qsysfunc(dread(&did,&i));  
          proc import datafile="&dir\%qsysfunc(dread(&did,&i))" out=dsn&cnt 
           dbms=csv replace;            
          run;          
       %end; 
      %end;  

    %end;
      %end;
  %else %put &dir cannot be open.;
  %let rc=%sysfunc(dclose(&did));      
             
 %mend drive;
 
%drive(E:\HIMS\TEST\Strokes\2019 20\HIT_Tool,csv) 

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

Do you not know what is in the CSV files?  Why are you using PROC IMPORT instead of just writing your own data step to read the file?

Once you know how to read one file you should be able to read them all using a wildcard in the input filename.  You can use the FILENAME= option on the INFILE statement to get access to which file you are reading.  You can use the change in input filename to know when you need to skip header rows.

data want;
  length fname $256 facility $3 ;
  infile 'E:\HIMS\TEST\Strokes\2019 20\HIT_Tool\facility_trend_*.csv'
         dsd filename=fname truncover 
  ;
  input @;
  if fname ne lag(fname) then row=0;
  row+1;
  if row<6 then delete;
  length var1 8 var2 $20 .... var_last $4 ;
  input var1--var_last ;
  facility=substr(fname,length(fname)-6,3);
run;
     

So to adjust that program to your data just change these two lines to reflect your actual variables.

  length var1 8 var2 $20 .... var_last $4 ;
  input var1--var_last ;

 

View solution in original post

8 REPLIES 8
ScottBass
Rhodochrosite | Level 12

This is how I'd do it:

 

* generate sample csv files ;
%macro create_csv(where);
   %let num=%eval(&num+123);  %* some random number ;
   filename csv "%sysfunc(pathname(work))\facility_trend_&num..csv";
   proc export data=sashelp.class(where=(&where)) file=csv dbms=csv replace;
   run;
%mend;
%let num=0;
%create_csv(%str(age=12))
%create_csv(%str(age=13))
%create_csv(%str(age=14))
%create_csv(%str(age=15))
%create_csv(%str(age=16))

* get a list of the files in the directory ;
%dirlist(dir=%sysfunc(pathname(work)),filter=basename=:'facility_trend_' and ext='csv')

* import each file, naming the dataset dsn#, where # is the row number from dirlist ;
%macro code;
%let fullname=%trim(&fullname);
%let num=&__iter__;
proc import file="&fullname" out=work.dsn&num dbms=csv replace;
   getnames=yes;
run;
%mend;
%loop_control(control=dirlist)

https://github.com/scottbass/SAS/blob/master/Macro/loop_control.sas

https://github.com/scottbass/SAS/blob/master/Macro/dirlist.sas

https://github.com/scottbass/SAS/blob/master/Macro/parmv.sas


Please post your question as a self-contained data step in the form of "have" (source) and "want" (desired results).
I won't contribute to your post if I can't cut-and-paste your syntactically correct code into SAS.
ScottBass
Rhodochrosite | Level 12

So I read your requirements more carefully...

 

PROC IMPORT of a CSV does some data checking under the covers, determining the data type and length from the input data, then generates a data step.

 

This is problematic when you want to append the data.  Even though your CSV file may have the same column structure, the data itself can mean the datasets won't append due to different structure.

 

For example, this does not work:

 

* generate sample csv files ;
%macro create_csv(where);
   %let num=%eval(&num+123);  %* some random number ;
   filename csv "%sysfunc(pathname(work))\facility_trend_&num..csv";
   proc export data=sashelp.zipcode(where=(&where)) file=csv dbms=csv replace;
   run;
%mend;
%let num=0;
%create_csv(%str(statecode='CA'))
%create_csv(%str(statecode='NY'))
%create_csv(%str(statecode='FL'))
%create_csv(%str(statecode='NC'))
%create_csv(%str(statecode='AZ'))

* get a list of the files in the directory ;
%dirlist(dir=%sysfunc(pathname(work)),filter=basename=:'facility_trend_' and ext='csv')

* import each file, naming the dataset dsn#, where # is the row number from dirlist ;
%macro code;
%let fullname=%trim(&fullname);
%let num=&__iter__;
proc import file="&fullname" out=work.dsn&num dbms=csv replace;
   getnames=yes;
run;
data &syslast;
   set &syslast (firstobs=6);
   if missing(cats(of _all_)) then delete;
run;
proc append data=&syslast out=final;
run;
%mend;

proc delete data=final;
run;

%loop_control(control=dirlist)

I'd advise just letting PROC IMPORT do its thing for a single file, then cut-and-paste the code from the log and edit as required.

 

For example, this does work (I've purposely over-allocated the problem columns from above):

 

* generate sample csv files ;
%macro create_csv(where);
   %let num=%eval(&num+123);  %* some random number ;
   filename csv "%sysfunc(pathname(work))\facility_trend_&num..csv";
   proc export data=sashelp.zipcode(where=(&where)) file=csv dbms=csv replace;
   run;
%mend;
%let num=0;
%create_csv(%str(statecode='CA'))
%create_csv(%str(statecode='NY'))
%create_csv(%str(statecode='FL'))
%create_csv(%str(statecode='NC'))
%create_csv(%str(statecode='AZ'))

* get a list of the files in the directory ;
%dirlist(dir=%sysfunc(pathname(work)),filter=basename=:'facility_trend_' and ext='csv')

* import each file, naming the dataset dsn#, where # is the row number from dirlist ;
%macro code;
%let fullname=%trim(&fullname);
%let num=&__iter__;
data DSN&num;
infile "&fullname" delimiter = ',' MISSOVER DSD lrecl=32767 firstobs=6;
   informat ZIP best32. ;
   informat Y best32. ;
   informat X best32. ;
   informat ZIP_CLASS $1. ;
   informat CITY $1000. ;
   informat STATE best32. ;
   informat STATECODE $2. ;
   informat STATENAME $1000. ;
   informat COUNTY best32. ;
   informat COUNTYNM $1000. ;
   informat MSA best32. ;
   informat AREACODE best32. ;
   informat AREACODES $1000. ;
   informat TIMEZONE $1000. ;
   informat GMTOFFSET best32. ;
   informat DST $1. ;
   informat PONAME $1000. ;
   informat ALIAS_CITY $1000. ;
   informat ALIAS_CITYN $1000. ;
input
            ZIP
            Y
            X
            ZIP_CLASS $
            CITY $
            STATE
            STATECODE $
            STATENAME $
            COUNTY
            COUNTYNM $
            MSA
            AREACODE
            AREACODES $
            TIMEZONE $
            GMTOFFSET
            DST $
            PONAME $
            ALIAS_CITY $
            ALIAS_CITYN $
;
if _ERROR_ then call symputx('_EFIERR_',1);  /* set ERROR detection macro variable */
if missing(cats(of _all_)) then delete;
run;
proc append data=&syslast out=final;
run;
%mend;

proc datasets lib=work nowarn nolist;
   delete final;
quit;

%loop_control(control=dirlist)

 

 

 


Please post your question as a self-contained data step in the form of "have" (source) and "want" (desired results).
I won't contribute to your post if I can't cut-and-paste your syntactically correct code into SAS.
Kurt_Bremser
Super User

If you want read all files in a directory with the same defined structure, you do not need ANY macro coding at all, as the infile statement accepts wildcards:

data _null_;
set sashelp.class;
file '$HOME/classes/class1.txt' dlm=';' dsd;
if _n_ = 1
then do;
  put;
  put;
  put 'name;sex;age';
end;
put name sex age;
run;

data _null_;
set sashelp.class;
file '$HOME/classes/class2.txt' dlm=';' dsd;
if _n_ = 1
then do;
  put;
  put;
  put 'name;sex;age';
end;
put name sex age;
run;

data want;
length fname inf $200;
infile "$HOME/classes/*.txt" dlm=';' dsd filename=inf truncover;
informat
  name $8.
  sex $1.
;
fname = inf;
input name@;
if name not in ('','name');
input sex age;
run;

And if you need to be more selective, you can use a compound file reference:

filename in ("$HOME/classes/class1.txt" "$HOME/classes/class2.txt");

data want;
length fname inf $200;
infile in dlm=';' dsd filename=inf truncover;
informat
  name $8.
  sex $1.
;
fname = inf;
input name@;
if name not in ('','name');
input sex age;
run;

so all you have to do is build the list of filenames for the filename statement. The read is then done in one single step.

 

ScottBass
Rhodochrosite | Level 12

@Kurt_Bremser 's approach is better than mine.  In particular, it will perform better than my approach.

 

In another post yesterday, I said "It is just as important to know when NOT to use a macro".  I should follow my own advice.

 

However, the OP stated:

1.  firstobs = 6 instead of the 2 that it is currently set at (I don't see in the code where this setting is)

 

I've changed your code to this (since I'm on EG on Windows):

 

%let root=%sysfunc(pathname(work));
%macro create_file(filename);
   data _null_;
      set sashelp.class;
      file "&root\&filename" dlm=';' dsd;
      if _n_ = 1
      then do;
        put 'name;sex;age';
      end;
      put name sex age;
   run;
%mend;
%create_file(class1.txt)
%create_file(class2.txt)
%create_file(class3.txt)

%dirlist(dir=&root,filter=basename=:'class' and ext='txt');

data class;set sashelp.class;run;  * just so it shows in EG ;
data want;
   set dirlist (keep=fullname);
   end=0;
   do until (end);
      infile dummy delimiter = ';' dsd missover lrecl=32767 firstobs=6 end=end filevar=fullname;
      informat
         name $8.
         sex $1.
      ;
      input name sex age;
      output;
   end;
run;

This works, but is there a better approach (esp. one that doesn't require %dirlist)?  I tried filevar= + filename= but couldn't get it to work (and lost patience 😉 )

 

 


Please post your question as a self-contained data step in the form of "have" (source) and "want" (desired results).
I won't contribute to your post if I can't cut-and-paste your syntactically correct code into SAS.
Kurt_Bremser
Super User

I use this condition

if name not in ('','name');

to jump over my artificially included "header" lines. This condition needs to be adapted to the actual content (or non-content) of the 5 lines in every file that need to be skipped.

s_lassen
Meteorite | Level 14

Solutions could be:

  1. Use the option DATAROW=6 in PROC IMPORT to start reading from row 6
  2. Submit a datastep after PROC IMPORT to add the column.
  3. You can use a WHERE clause on the output dataset
  4. Use PROC APPEND
%macro drive(dir,ext); 
   %local cnt filrf rc did memcnt name; 
   %let cnt=0;          

   %let filrf=mydir;    
   %let rc=%sysfunc(filename(filrf,&dir)); 
   %let did=%sysfunc(dopen(&filrf));
    %if &did ne 0 %then %do;   
   %let memcnt=%sysfunc(dnum(&did));    

    %do i=1 %to &memcnt;              
                       
      %let name=%qscan(%qsysfunc(dread(&did,&i)),-1,.);                    
                    
      %if %qupcase(%qsysfunc(dread(&did,&i))) ne %qupcase(&name) %then %do;
       %if %superq(ext) = %superq(name) %then %do;                         
          %let cnt=%eval(&cnt+1);       
          %put %qsysfunc(dread(&did,&i));  
          proc import datafile="&dir\%qsysfunc(dread(&did,&i))" out=dsn&cnt(where=(X is not NULL)) 
           dbms=csv replace datarow=6;            
          run;     
  data  dsn&cnt;
    set dsn&cnt;
      length file_ID $10;
      retain File_ID "&cnt"; /* Or whatever you want to use to identify the file */
  run;
  proc append base=Facility force;
  run;
       %end; 
      %end;  

    %end;
      %end;
  %else %put &dir cannot be open.;
  %let rc=%sysfunc(dclose(&did));      
             
 %mend drive;
 
%drive(E:\HIMS\TEST\Strokes\2019 20\HIT_Tool,csv) 

PROC APPEND will not work if SAS interprets data from different files differently, and you may have problems with the data being truncated, if columns in the first file are shorter than in some of the following files. In that case, you may have to use the data step generated by PROC IMPORT, with modifications, instead of PROC IMPORT itself - press F4 (RECALL) after submitting a PROC IMPORT of one file, and then modify that code and use it instead of PROC IMPORT.

Tom
Super User Tom
Super User

Do you not know what is in the CSV files?  Why are you using PROC IMPORT instead of just writing your own data step to read the file?

Once you know how to read one file you should be able to read them all using a wildcard in the input filename.  You can use the FILENAME= option on the INFILE statement to get access to which file you are reading.  You can use the change in input filename to know when you need to skip header rows.

data want;
  length fname $256 facility $3 ;
  infile 'E:\HIMS\TEST\Strokes\2019 20\HIT_Tool\facility_trend_*.csv'
         dsd filename=fname truncover 
  ;
  input @;
  if fname ne lag(fname) then row=0;
  row+1;
  if row<6 then delete;
  length var1 8 var2 $20 .... var_last $4 ;
  input var1--var_last ;
  facility=substr(fname,length(fname)-6,3);
run;
     

So to adjust that program to your data just change these two lines to reflect your actual variables.

  length var1 8 var2 $20 .... var_last $4 ;
  input var1--var_last ;

 

shellp55
Quartz | Level 8

Thank you so much to all who responded, I really appreciate it.  Tom, I did not know that you could use a wildcard for the infile statement so that will be life changing for future code!   I also liked the simplicity of the code you provided and it worked like a charm. 

 

Have a great day everyone!

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 8 replies
  • 6251 views
  • 2 likes
  • 5 in conversation