BookmarkSubscribeRSS Feed
briannalozelle
Calcite | Level 5

I am very new to SAS and I am trying to iterate over a set of SAS files in a library. They all have names like filename_2022xxxx where 'xxxx' is the day and month the file was created. 

For each file I will create the same tables which will be appended to a main CSV export file. 

 

The following pseudo code is representative of what I would like to do.

 

For file in lib with name like filename_*

Let expstamp = *   

/*the specific date stamp (expstamp) of the file is needed for reference in the creation of a subsequent table */

Create Table A 

/*Create table with Proc SQL, subsequent tables will reference this table A rather than the original dataset */

Create Table B 

Create Table C

/* this table needs a column with exp_stamp variable */

If export file does not exist:

Create export csv file 

Else concatenate output to existing output csv

End.

 

I am having no trouble with the creation of my tables beyond the fact that I have to hardcode file names into my script, which is what I am looking to avoid since there are numerous files. 

 

I am not able to attach the data but would like an idea of how to approach this. 

 

I have successfully added a column with the date stamp from the original data set by the following method with aforementioned hardcoding: 

data lib.temp;
set lib.filename_20220722
indsname = name;
EXP_STAMP =  scan(scan(name, 2),3,'_');
run;
	

I am unsure if there is an optimal method that does not require nesting scan functions. 

2 REPLIES 2
ballardw
Super User

Instead of creating a bunch of sets you could combine all of then and do that calculation at the same time:

data lib.temp;
   set lib.filename_:
        indsname = name;
   EXP_STAMP = input(scan(name,2,'_'),yymmdd8.);
   format Exp_stamp yymmdd10.; 
run;
	

The  colon at the end of the dataset name tells SAS to use a list of all the sets whose name starts with Filename_. If you have other endings beside the data you could modify that to lib.filename_20: to the files that start with that string.

If the Exp_stamp is not sufficient to keep track of the file source you could add a statement like

Source=name;

to keep the name of the set as a variable.

 

You could quite likely be better of using BY Group processing Using the Exp_stamp or possible Source variable to do the rest.

 

Or you get the file names in a variable and call the code. Such as this:

data _null_;
   set sashelp.vtable;
   where libname='LIB' and memname=:'FILENAME_';
   length fulllibdsn $ 41.;
   fulllibdsn=catx('.',libname,memname);
   /*replace cats(temp,_n_) below with an
     expression that creates your desired output names
   */
   call execute('data '||cats(Temp,_n_)';');
   call execute('set '||fulllibdsn||' indsname=name ;');
   call execute('EXP_STAMP = input(scan(name,2,'_'),yymmdd8.);
      format Exp_stamp yymmdd10.;
      run;'); 
run;

This if you provide a data set name instead of the _null_ construct then you have a data set with the original name and could add your output name so you can keep track of which sets you created. Keeping track of the multiple sets and how to make the later names to reference use can be a problem.

 

Then there is the option encapsulating the whole process into macro or two and just calling the whole thing with the name of the data set to start with, which can again be done with the data step and call execute. Or use the data step to write a program.sas file with the statements desired and then %include the file.

mkeintz
PROC Star

 

Does filename_20220510 represent May 10th, or Oct 05?  I.e. is the file name structure  filename_yyyymmdd, or filename_yyyyddmm?

 

Let's hope it's yyyymmdd, because you could then use an expression like

 

 

   set filename_202205:   indsname=dsnam;

 

 

to get all the datasets for May 2022, in chronological order, concatenated into a single monthly dataset..  Note however that SAS will create a buffer in memory for each of the datasets implied in the SET statement.  This can cost a lot of memory, and can slow down performance.

 

But (presuming those datasets all have the same variables), consider using the OPEN=DEFER option, as in

 

  set filename_202205:  indsname=dsnam  open=defer;

This will re-use a single memory buffer for each dataset in succession.

 

 

BTW, I would not make EXP_STAMP a character variable.  Make it a SAS date variable, as in

 

 

  if dsnam^-lag(dsnam) then exp_stamp=input(scan(dsnam,2,'_'),yymmdd8.);
  retain exp_stamp;
  format exp_stamp yymmddn8. ;

 

Then, if needed, you can generate or use date ranges, and date intervals, in your subsequent programs.

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 2 replies
  • 609 views
  • 3 likes
  • 3 in conversation