execute same procedure over multiple datasets

Accepted Solution Solved
Reply
Regular Contributor
Posts: 161
Accepted Solution

execute same procedure over multiple datasets

Dear All:

I have a library with hundreds of datasets like:

cq_19970108.sas7bdat

cq_19970109.sas7bdat

cq_19970110.sas7bdat

cq_19970115.sas7bdat

How to use macro to extract 1% of the sample stratified by, say, variable A, B, and C and then put them in separate files (keep the original naming convention but in a different library)?  I have the general idea that I need to use the proc contents procedure and then somehow plug the result of the proc contents into a proc surveyselect macro and repeat the process.  Not quite sure how to google this around to know to do this in the right way.  I think this got to be common practice.

Thank you!

  • HAVE:

library name: HAVE

datasets:

cq_19970108.sas7bdat

cq_19970109.sas7bdat

cq_19970110.sas7bdat

cq_19970115.sas7bdat

  • WANT:

library name: WANT

datasets:

cq_19970108.sas7bdat

cq_19970109.sas7bdat

cq_19970110.sas7bdat

cq_19970115.sas7bdat

Procedure to be repeated proc surveyslect at 1% of the original data observations.

/* 1. Get a list of datsets */

libname have  'E:\wrds\taq\taq.1996\taq96d\sasdata';

ods output "Library Members" = have_table;

proc datasets library=have memtype=data ;

run;

ods output close;



proc sql;

    create table names as

          select distinct Name

              from have_table

                  where MemType = 'DATA';

quit;

/* 2. Just one datset (Not how to plug the name dataset into the Step 3 do loop) */

%let dn = cq_19960105;

proc surveyselect data=have.&dn method = urs sampsize = 1000

     rep=1 seed=12345 out=want.&dn out=outhits;

     id symbol date time;

run;

/* 3. Looping over the entire list of datasets */

%macro sampleselect();

    %do i = 1 %to &N;

        %put &&deal_no_&i;

    %end;

%mend sampleselect();

Attachment

Accepted Solutions
Solution
‎08-20-2014 01:14 PM
Super User
Super User
Posts: 6,845

Re: execute same procedure over multiple datasets

Generating many macro variables is probably much more work and potentially too confusing for what you want.

Creating a macro and using a tool like CALL EXECUTE to generate multiple calls to it is probably easier to code and understand.

%macro sample1(dn);

proc surveyselect data=have.&dn method = urs sampsize = 1000

     rep=1 seed=12345 out=want.&dn out=outhits;

     id symbol date time;

run;

%mend sample1 ;

libname have ... ;

libname want ... ;

proc contents data=have._all_ noprint out=contents;

run;

data _null_;

  set contents;

  by memname ;

  if first.memname;

  call execute(cats('%sample1(',memname,')'));

run;

View solution in original post


All Replies
Super User
Super User
Posts: 6,845

Re: execute same procedure over multiple datasets

Here is one approach.

Step 1 is write the code to do one file.

Step 2 is probably to convert that into a macro with a single parameter, the name of the dataset (the member name).

Step 3 is generate calls to the macro for every member in the input library.

Start by posting your solution to Step 1.

Regular Contributor
Posts: 161

Re: execute same procedure over multiple datasets

Hi, Tom:

I specified the main steps I hope to go through and uploaded a sample data for just one dataset in the library.  But I haven't find a nice guide how to plug a list (in the form of a sas data) to a do until macro loop to get the job done.  Could you offer me some guidance on this?  Some of the examples I searched online is difficult to understand since I'm not sure what they want to achieve in the first place and not sure the data structure they are referencing to.  Thanks -

Super User
Super User
Posts: 7,720

Re: execute same procedure over multiple datasets

Hi,

There are many posts I have put out for this subject:

libname temp "path to where you have your datasets"; /* Note you can exclude this if you already have your libname */

data _null_;

     set sashelp.vtable (where=(libname="TEMP"));  /* Gets all datasets in that libname */

     call execute('data my_new_l.'||strip(memname)||';

                              set temp.'||strip(memename)||';

                          run;');

run;

The above will run the datastep in the execute block for every dataset in the given libname.

Valued Guide
Posts: 3,208

Re: execute same procedure over multiple datasets

I do not really understand what you are wanting.
It could be make selection in a collection in some datasets .... analyze those grouped.


A simple  approach with two datasets. First I just creating them. Note: only 1 datastep is needed.

data dataset1 dataset2 ;
  Do i= 1 to 20 ;         date=i ; output dataset1;   end;
  Do i= 100 to 150 ;    date=i ; output dataset2;   end;
run;

Combine-union off all datasets and select (stratified as you like) record can be done as:

data select ;                      /* a single dataset as result */

  length datain $41 ;         /* names in datain can be "libary.member" total 41 long */

  set dataset: indsname=_datain ;    /* select all datasets in the list abbreviated by the : symbol

                                                             - here just 2 but it is variable SAS(R) 9.4 Language Reference: Concepts, Third Edition */ 

    if ( ranuni(-1) < 0.5 ) ;                   /* the selection replace that with yours.  This is simple 50% random */

  datain=_datain;                              /* keeping the name of the original dataset as known from before the combine-union */

  put datain date _n_ ;     /* just showing what is done, remove with real processing */

run;

The only pre-req is that all members are having similar structure (variables).

Needing to split up can be done with a datastep. At that moment all name must be prepared  (data namelist  ) followed  by a select/output on that datain.

More often the combined dataset is the one needed for analyzing.

---->-- ja karman --<-----
Regular Contributor
Posts: 161

Re: execute same procedure over multiple datasets

Hi, Jaap:

Sorry that I confused you and thank you for the example you just presented!

All I want to do is to first get a list of the datasets within the HAVE library.  And then plug this list into some kind of do loop.

The loop is to select 1% of the sample out of the datasets in the HAVE library (or even proc sort to make things simple).  The loop work on the procedures repeatedly until it reaches the end of the list.  I now know how to use proc datasets to get the list of datasets in a library, but I'm not sure how to plug that into a sas do until loop/macro.

Could you point me in the right direction?  Thank you -

Super User
Posts: 19,157

Re: execute same procedure over multiple datasets

Write a macro and then use call execute to invoke it once for each data set.

Valued Guide
Posts: 3,208

Re: execute same procedure over multiple datasets

The sample RW9 has given is using the list of all datasets using sashelp.vtable I think in this case that is better than using proc datasets.
Having that list is generating new commands to be executed with that name.  Probably the most easy quick and clean approach.

I think you should try the examples. No the old same stoneage anymore. We are adding some bronze to it. 

The IT world is changing, some new pony tricks to add. 
  

---->-- ja karman --<-----
Regular Contributor
Posts: 161

Re: execute same procedure over multiple datasets

Thank you, Jaap and Tom.  I'll implement it right now.  Get tied up in recent days due to deadlines.  Yes, I need to improve my toolkits Smiley Happy 

Enjoy a great weekend ~

Solution
‎08-20-2014 01:14 PM
Super User
Super User
Posts: 6,845

Re: execute same procedure over multiple datasets

Generating many macro variables is probably much more work and potentially too confusing for what you want.

Creating a macro and using a tool like CALL EXECUTE to generate multiple calls to it is probably easier to code and understand.

%macro sample1(dn);

proc surveyselect data=have.&dn method = urs sampsize = 1000

     rep=1 seed=12345 out=want.&dn out=outhits;

     id symbol date time;

run;

%mend sample1 ;

libname have ... ;

libname want ... ;

proc contents data=have._all_ noprint out=contents;

run;

data _null_;

  set contents;

  by memname ;

  if first.memname;

  call execute(cats('%sample1(',memname,')'));

run;

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 9 replies
  • 2930 views
  • 1 like
  • 5 in conversation