BookmarkSubscribeRSS Feed
BeckyBell2355
Calcite | Level 5

We combine a large number of files at once to make one SAS data file.  I'd like to create a reusable generic SAS program that doesn't need to be changed each time we combine a new set of files (or as little as possible) -- (Ie., so it doesn't list the files out individually in the code).  (I have some knowledge in general programming but am newish to SAS, so please reply like I don't understand much.)

 

The files for each year have the same variables and structure (.txt w/ comma delimited) including a header.  BUT-

1. the files are stored in separate folders (and folders include other .txt files we want to exclude), and

2. the folder and file names change each time we get new sets of data files to reflect the new date.

 

Here is an example of the folders & file names, with "we want" marking those wanted to combine into 1 data set from both folders.  (This example is simplified from real situation... more folders and files as well as exceptions aren't always the same from year to year, etc.)

Folder name = C:\EIS\EISFall15

ABC_EISFall2015.txt (---we want)

BCD_EISFall2015.txt (---we want)

CDE_WW_EISFall2015.txt (---we want)

CDE_EISFall2015.txt (---want to EXCLUDE)

 

Folder name = C:\EIS\EISSpring16

ABC_EISSpring16.txt (---we want)

BCD_EISSpring16.txt (---we want)

CDE_WW_EISSpring16.txt (---we want)

CDE_EISSpring16.txt (---want to EXCLUDE)

 

Is there a way to use a macro or something which calls a list of file names/locations (like a list for each year saved in different file, not in this generic code) so I would only have to change the name of that file list in the generic file to create files for different years?

 

So something like this:

generic code that loops and creates the combine data file... 

generic code that opens directory fiie C:\EIS\EIS1516.txt  [that name would be the only thing that would need to change each year]

generic code loops through and combines only the files listed in the directory file to create 1 large data file.

generic code saves SAS data file in same location as directory file.

 

Is this possible?  Or other thoughts/ideas?  Thanks.

Becky

14 REPLIES 14
Reeza
Super User

Yes. Your issue will be getting your file list, but once you do you can use the filevar option to provide the file name.

 

FILEVAR=variable

specifies a variable whose change in value causes the INFILE statement to close the current input file and open a new one. When the next INPUT statement executes, it reads from the new file that the FILEVAR= variable specifies. Like automatic variables, this variable is not written to the data set.

 

https://communities.sas.com/t5/SAS-Communities-Library/How-do-I-write-a-macro-to-import-multiple-tex...

 

See Example 5

http://support.sas.com/documentation/cdl/en/lestmtsref/68024/HTML/default/viewer.htm#n1rill4udj0tfun...

 

To generate your list, use OS commands - what is your OS?

 

 

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Hi,

 

Well, this is a more advanced topic, you would want someone to design - i.e. functional design spec - and build a nice macro or tool for this really.  Here is some code, step one it gets a list of files and directories in a root directory.  It uses this list to create a call per CSV file (note the where clause - change as you need) to run the import code, in this I just show a basic input statement:

filename d pipe 'dir "s:\temp\rob" /b /s';
data dirs;
  length buffer $1000;
  infile d dlm="¬";
  input buffer $;
run;

data _null_;
  set dirs (where=(index(buffer,".csv")>0);
  call execute(cat('data want',strip(put(_n_,best.),'; infile "',strip(buffer),'"; input a $ b $; run;'));
run;

Note this was designed for Windows - i.e. dir is a windows command.

data_null__
Jade | Level 19

@RW9 You mentioned FILEVAR but did not use it. I don't see how your example will extrapolate to what @BeckyBell2355 needs although @BeckyBell2355 has not given us the full story. 


@RW9 wrote:

Hi,

 

Well, this is a more advanced topic, you would want someone to design - i.e. functional design spec - and build a nice macro or tool for this really.  Here is some code, step one it gets a list of files and directories in a root directory.  It uses this list to create a call per CSV file (note the where clause - change as you need) to run the import code, in this I just show a basic input statement:

filename d pipe 'dir "s:\temp\rob" /b /s';
data dirs;
  length buffer $1000;
  infile d dlm="¬";
  input buffer $;
run;

data _null_;
  set dirs (where=(index(buffer,".csv")>0);
  call execute(cat('data want',strip(put(_n_,best.),'; infile "',strip(buffer),'"; input a $ b $; run;'));
run;

Note this was designed for Windows - i.e. dir is a windows command.


 

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Sorry, I can't see where I mentioned FILEVAR.  

 

OP requested a method to import various CSV files from different directories, I provide code to do this?  First step gets list of all files in root directory and subfolders - just assuming here - then generates a datastep import for each CSV file.  True, it will need modification based on what folder she has, what the files are, what exclusions to apply etc., but then I am not contracted to develop the, just show some general code.

data_null__
Jade | Level 19

Sorry I can't keep the "Super Users" straight.

BeckyBell2355
Calcite | Level 5

I want to provide the list of the files to import -- not all files in the folders should be combined.  This is the problem I'm trying to get around.  The other files need to be in the same folder and can't be moved.  

I'm fine with creating a seperate text file by hand that lists each filename including directory --- I just don't want those listed in the import program because they will keep changing -- however, I want to still be able to rerun last year's list of files.

Reeza
Super User

Create your list by hand. 

Store last years list somewhere.

Make it a sas dataset, with a variable that has the files you want to import. 

 

Then the post I linked to above should work - all files are read into one data set, once you've specified your input statement. 

 

I suggest trying it on some test files and seeing how it works for you. Post back if you have issues. 

 

Your exact use case is also listed in the documentation - which was also provided above.

data_null__
Jade | Level 19

How each file is read now.  All as one data stream or as individual files.  Using DATA STEP or PROC IMPORT etc.

 

Do you need/want to change any of that?

BeckyBell2355
Calcite | Level 5

It was changed to a Data Step that looped and picked up all .txt files in specific folders because it was much faster that PROC IMPORT for each file  -- the issue was that some files shouldn't be included.  

I'm going to look into the other response and see if I can make that work.

Reeza
Super User

One quick way to change your current process - use the FILENAME option - this provides an variable that is the file name so you can trace the source for each record.

 

Then you can delete out the ones you don't want.  

 

If the exclusion list is small this is probably a quicker way to go...

 

data_null__
Jade | Level 19

You might get better help if you show the code you use now or at least a working example.


@BeckyBell2355 wrote:

It was changed to a Data Step that looped and picked up all .txt files in specific folders because it was much faster that PROC IMPORT for each file  -- the issue was that some files shouldn't be included.  

I'm going to look into the other response and see if I can make that work.


 

ballardw
Super User

If the files and folder structure are that consistent then something like this might give you a start on building a list of files and locations.

data buildlist;
   input org $;
   length filevar $ 100;
   do year=15 to 16;
      filevar=cats("C:\EIS\EISFall",year,"\",org,"_EISFall20",year,".txt");
      output;
   end;
datalines;
ABC
BCD
CDE_WW
;
run;
proc sort data=buildlist out=wantlist (keep=filevar); 
   by year org;
run; 

The want list could be used in a set statement and the filevar to control the reading.

 

Note: I heartily disapprove of 2 digit year folder names because the above code would be simpler and work across 1999 to 2000 if you had historical files that needed processing.

 

And from your folder names I wouldn't be suprised if you had a Spring as well. You could add a loop with those values to build the names also.

The keys are consistent naming and hopefully being able to build the list of base values (I called Org for no real reason).

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 14 replies
  • 1590 views
  • 0 likes
  • 5 in conversation