BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
BlueNose
Quartz | Level 8

Hello all,

 

I have a folder containing several CSV files. I wish to write a small SAS program (based on macro) that reads all the files into SAS datasets. My input is a folder with CSV files, my output should be the same set of SAS datasets, stored in a library called, let's say c, or any other name (not in WORK as I want them permanent).

 

I have tried something but stuck in the middle. First I read all the names of the files in the folder into a datset, which is the list of all file names. Then I create macro variables giving each file an index and another macro variable counting the total number of files. Now I need to write a small macro that will use proc import to import each file into SAS and store it in a dataset. I have a bit of trouble defining the different names (as it is fairly easy to run in a loop and just import a file).

 

My code so far is:

 

 

%let path=C:\My_SAS_PROJECTS;
filename folder "&path\Data.";
 
/* Making a list of all files in the folder */
data FilesInFolder;
   length Line 8 File $300;
   List = dopen('FilesInFolder');
   do Line = 1 to dnum(List);
        File = trim(dread(List,Line));
        output;
   end;
   drop list line;
run;
 
/* Creating local macro variables */
data _NULL_;
     set FilesInFolder end=final;
     call symput(compress('File'||_N_),trim(File));
     if final then call symput(trim('Total'),_N_);
run;
  
/* This macro should import all files specified in the list and save them as datasets */
%macro loop;
 
%do i = 1 %to &Total;
 
proc import datafile="&path\Data\&&File&i"
     out=&name&i.
     dbms=csv
     replace;
     getnames=no;
run;
.
.
.
. 
 
%end;
 %mend loop;
 
%loop
 

 

 

Can you assist me completing the macro function ?

 

Thank you !

1 ACCEPTED SOLUTION

Accepted Solutions
FreelanceReinh
Jade | Level 19

I've made a few modifications to your code. If the names of your CSV files are valid SAS dataset names, you can omit the VALIDMEMNAME option and the NLITERAL function. I have tested the code with the following "ugly" file names:

  • 3abc.csv
  • de$fg.txt.csv
  • This is a very long file name with more than 32 characters.csv
%let path=C:\My_SAS_PROJECTS;
filename folder "&path\Data.";
libname c 'C:\Temp';         /* just an example destination folder */
options validmemname=extend; /* to allow non-standard dataset names */
 
/* Making a list of all files in the folder */
data FilesInFolder;
   length Line 8 File $300;
   List = dopen('folder');  /* corrected the function argument */
   do Line = 1 to dnum(List);
        File = trim(dread(List,Line));
        output;
   end;
   drop list line;
run;
 
/* Creating global macro variables */  /* not "local" */
data _NULL_;
     set FilesInFolder end=final;
     call symput(cats('File', _N_), trim(File));     /* used CATS instead of COMPRESS (...||...) */
     call symput(cats('Name', _N_), trim(nliteral(substr(File,1,min(32, length(File)-4))))); /* inserted */
     if final then call symputx(trim('Total'), _N_); /* replaced symput by symputx */
run;


/* This macro should import all files specified in the list and save them as datasets */
%macro loop;
%do i = 1 %to &Total;
  proc import datafile="&path\Data\&&File&i"
       out=c.&&name&i  /* adapted */
       dbms=csv
       replace;
       getnames=no;
  run;
%end;
%mend loop;
 
%loop

If non-standard dataset names have been created, you can use them as in this example:

proc print data=c.'de$fg.txt'n;
run;

 

 

 

View solution in original post

11 REPLIES 11
BrunoMueller
SAS Super FREQ

hi

 

@ChrisHemedinger has written a great blog entry on this subject, how to loop over a list of values and run some SAS code, see

http://blogs.sas.com/content/sasdummy/2012/03/20/sas-program-by-processing/ and http://blogs.sas.com/content/sasdummy/2012/03/23/improving-on-a-sas-programming-pattern/

 

It has great explanation on how everything works.

 

Bruno

FreelanceReinh
Jade | Level 19

I've made a few modifications to your code. If the names of your CSV files are valid SAS dataset names, you can omit the VALIDMEMNAME option and the NLITERAL function. I have tested the code with the following "ugly" file names:

  • 3abc.csv
  • de$fg.txt.csv
  • This is a very long file name with more than 32 characters.csv
%let path=C:\My_SAS_PROJECTS;
filename folder "&path\Data.";
libname c 'C:\Temp';         /* just an example destination folder */
options validmemname=extend; /* to allow non-standard dataset names */
 
/* Making a list of all files in the folder */
data FilesInFolder;
   length Line 8 File $300;
   List = dopen('folder');  /* corrected the function argument */
   do Line = 1 to dnum(List);
        File = trim(dread(List,Line));
        output;
   end;
   drop list line;
run;
 
/* Creating global macro variables */  /* not "local" */
data _NULL_;
     set FilesInFolder end=final;
     call symput(cats('File', _N_), trim(File));     /* used CATS instead of COMPRESS (...||...) */
     call symput(cats('Name', _N_), trim(nliteral(substr(File,1,min(32, length(File)-4))))); /* inserted */
     if final then call symputx(trim('Total'), _N_); /* replaced symput by symputx */
run;


/* This macro should import all files specified in the list and save them as datasets */
%macro loop;
%do i = 1 %to &Total;
  proc import datafile="&path\Data\&&File&i"
       out=c.&&name&i  /* adapted */
       dbms=csv
       replace;
       getnames=no;
  run;
%end;
%mend loop;
 
%loop

If non-standard dataset names have been created, you can use them as in this example:

proc print data=c.'de$fg.txt'n;
run;

 

 

 

BlueNose
Quartz | Level 8

Thank you for your tips.

 

I have a few of questions:

 

1. Can you kindly explain the line you added with the nliteral command, what is it for ?

 

2. If I want that the first row in each file will be the variable names, should I change the getnames to yes ?

 

3. If I wanted to change this code, so instead of csv files, it will do the same with XLS files (Excel 97-2003), what needs to be modified ?

 

Thank you very much, big help.

FreelanceReinh
Jade | Level 19

You're welcome. Here are the answers to your questions:

 

  1. I inserted that line in order to prepare names for the output datasets. This could be simplified if you were happy with dataset names such as DS1, DS2, etc. In this case one could write the original CSV file names into dataset labels in order to provide the link between the dataset and the corresponding raw data file.

    My suggestion aimed at taking the CSV file names as dataset names (i.e. abc.csv --> abc.sas7bdat), as far as possible, so that it is obvious which raw data file belongs to a given dataset. But there is the potential issue that some of the CSV file names do not comply with the rules for valid SAS dataset names. Problematic cases would include names which start with a digit, contain special characters other than the underscore or are longer than 32 characters.

    The expression substr(File,1,min(32, length(File)-4)) results in the CSV file name without the ".CSV" suffix, truncated to 32 characters, if the original name was longer. The NLITERAL function converts this string to a so called SAS name literal. Examples: NLITERAL('abc')='abc' (i.e. valid SAS names are left unchanged), NLITERAL('ab$c')='"ab$c"N' (in general, the dollar sign is not allowed in a dataset name, but in the special syntax "..."N, it is acceptable). If datasets with non-standard names are created, the option setting VALIDMEMNAME=extend is necessary to let SAS accept them. Honestly, I have never used such non-standard dataset names in practice and I would rather recommend to avoid them. So, in your situation I would actually look through the list of CSV file names and see how non-compliant names (if any) could be transformed into valid SAS names. For example, if blanks were the only special characters in the names, I would simply replace them with underscores in the dataset names.

  2. Yes, if the first line in each CSV file contains the variable names, GETNAMES=yes would instruct SAS to name the variables correspondingly. SAS would automatically modify the names appropriately if they were not valid variable names (depending on the setting of system option VALIDVARNAME).

  3. If you have licensed SAS/ACCESS Interface to PC File Formats, you can specify dbms=excel (or dbms=xls) in the PROC IMPORT step to import Excel 97-2003 files. In addition, there are SHEET= and RANGE= options to specify the spreadsheet (if there are more than one) and a range of cells (if only part of the spreadsheet is to be imported). Alternatively, you could use a LIBNAME statement of the form libname myxls excel path="&path\Data\test.xls" to access an Excel workbook as if it was a SAS library. For more details please see
    the documentation: http://support.sas.com/documentation/cdl/en/acpcref/67382/PDF/default/acpcref.pdf
    or some of the papers on this subject, e.g. http://support.sas.com/resources/papers/proceedings10/144-2010.pdf.

     

    I don't have that SAS/ACCESS license (and I don't have Excel installed on my SAS workstation), so I can't test this. When I imported Excel sheets in the past (primarily with SAS 6 and SAS 8), I mostly used DDE (dynamic data exchange), which is an "old", partly outdated technology, but very flexible. For example, I was able to let SAS detect the text color used in an Excel sheet (which carried some information), but that was extremely complicated.

BlueNose
Quartz | Level 8
Thank you, everything makes sense now.
Your "old" code sounds complicated indeed ! 🙂
ballardw
Super User

One of many cautions with using Proc Import and reading variable names from column headers: The discussion that mentions for data set names also appies to variables with the added complication that if you have two or more column headings that start with the same text that proc import may truncate the first occurance and the next could get a very generic varaible name of VARxx where xx refers to the column number.

 

Example:

Column 10: This is a very long column header related to the annual values of Product X

Column 11: This is a very long column header related to the annual values of Product Y

 

With maximum variable name length of 32 characters these tow column headers "look the same" to the Proc Import processor.

Babloo
Rhodochrosite | Level 12

Could you please tell me why we need two ampersand before file and name macro in following proc import?

 

  proc import datafile="&path\Data\&&File&i"
       out=c.&&name&i  /* adapted */
       dbms=csv
       replace;
       getnames=no;
  run;
FreelanceReinh
Jade | Level 19

@Babloo: The macro processor resolves expressions like &&name&i in two passes:

  1. && resolves to & and &i is resolved to the content of macro variable i, e.g., to 1 in the first iteration of the "%do i=1 %to ..." loop. Result (in the first iteration of the %DO loop): &name1
  2. &name1 is resolved to the content of macro variable name1.

So, whenever you loop through a list of numbered macro variables with a %DO loop you will probably use this pattern.

 

If the expression was written as &name&i, the macro processor would try to resolve &name to the content of a macro variable name in the first pass, but this is not what we want in the above situation. There is no macro variable name, hence we would get a warning "WARNING: Apparent symbolic reference NAME not resolved."

abubakrayesh
Fluorite | Level 6

Hi, I have been trying to use this loop, however, I get the following:

 

WARNING: Apparent symbolic reference TOTAL not resolved.

ERROR: A character operand was found in the %EVAL function or %IF condition where a numeric operand is required. The condition was: &Total

ERROR: The %TO value of the %DO I loop is invalid.

ERROR: The macro LOOP will stop executing.

 

Could you kindly help with this?

FreelanceReinh
Jade | Level 19

Hi @abubakrayesh, welcome to SAS Support Communities.

 

Your problem must be slightly different, because otherwise the identical solution should work, right? So, please open a new thread (there is a button "Post a question" on the homepage) rather than posting into an old one. That way many more people will see your question. When you do so, please describe (ideally by posting your SAS code) how you set macro variable TOTAL. Your log messages indicate that TOTAL simply hasn't been defined before it is referred to.

 

You can provide a link to the old thread (e.g. https://communities.sas.com/t5/General-SAS-Programming/Importing-several-files-into-SAS/m-p/241842#M...) if necessary.

 

Good luck! I'm sure your problem will be solved soon.

Reeza
Super User
You may want to consider using Call Execute and some macro parameters in your macro instead. This way you can create all the parameters in a data set and then use that data set to drive your macro.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 11 replies
  • 22587 views
  • 6 likes
  • 7 in conversation