BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Lida
Obsidian | Level 7

Hi,

I am from an input file using a do loop in a data step.  (please see the  partial code below). In the test code I am asking for 30 000 subsets but SAS generates only about 22 000 than stops. What could be a reason for this? To complete my task I will need to generate 250 000 subsets from an input file. How I can do this? Should I try  proc SQL instead of a data step?  Thank you!

The file in set instruction is more than 1 Gb and the generated subsets are only 17Kb.

Data %do i=1 %to 30000; inputdata&i end;;

set inputdata;

%do i=1 %to 30000;

If BSTR=&i then output input&i;

%end;

run;

1 ACCEPTED SOLUTION

Accepted Solutions
Astounding
PROC Star

Lida,

While all this is likely to work, you will still encounter similar problems in the bottom half of your program.  You will encounter resource issues when you try to merge 250,000 data sets.

A better approach would collect the output data sets vertically instead of horizontally.  For example:

data lifetable;

   set &output_table._&i (keep=sex agegroup ex exa);

   /* Extra, but possibly not needed:  BSTR = &i;  */

run;

proc append data=lifetable base=&output_data;

run;

/* PROC SQL as above, to drop &output_table._&i  */

%end;  /* All loop iterations are complete */

proc summary data=&output_data nway;

   var ex exa;

   class sex agegroup;

   output out=&output_data (drop=_freq_ _type_)

             mean = LEmean HALEmean

             var = LEvar HALEvar;

run;

It would be up to you if you wanted to replace &output_data with the summarized version, or if you wanted to keep all the detail from each bootstrap sample and use a separate data set to hold the summary.  But don't try to merge 250,000 data sets and be wary of creating a data set with 1M variables when you have an alternative.

It looks like you already have PROC SQL dropping any earlier version of &output_data, which is a good thing to do.

Good luck.

View solution in original post

20 REPLIES 20
art297
Opal | Level 21

You will have to post more of your code and the part of the log showing the error (if any) you are receiving would help.

It appears that your code is embedded in a macro but, as posted, wouldn't run.

I think more info is needed before anyone can help.

Haikuo
Onyx | Level 15

You would need to share more code to the boundary of your Macro definition. %do does not work this way (open code).

Regards,

Haikuo

Lida
Obsidian | Level 7

Yes, you are right: this is a partial code from a macro defenition.

%macro test (inputdata=,b=);

data %do i=1 %to &b; inputdata&i end;;

set &inputdata;

%do i=1 %to 30000;

If BSTR=&i then output input&i;

%end;

run;

/* some other calculations*/

%mend test;

%test (inputdata=mydata,b=30000);

I cannot show you the log now because I am running a program now and saving a log into a file. However, I can see through the windows explorer that the macro generated only about 22 000 subsets out of 30000. It continues to run using the generated subsets to perform other calculations.

Thanks.

Astounding
PROC Star

There could easily be a hardware limitation.  SAS may not have the resources to track 30,000 data sets in a single data step.  You may be forced to run the program several times, creating 20,000 data sets each time and using different ranges for &I each time.

As a secondary suggestion, I would add this line inside your second %DO loop:

%if &i > 1 %then else;

It will turn your set of 30,000 IF/THEN statements into IF/THEN/ELSE statements.  (Of course, if &I ranges from 20001 to 40000, then the comparison becomes %if &i > 20001 %then else;)

Good luck.

Lida
Obsidian | Level 7

Yes, my plan B is to run this by parts. But I am still hoping to find a way to run it only one time.

I will check your other suggestion. thanks!

Astounding
PROC Star

I would be interested in seeing how much difference that makes in the total run time.

The other posts are valid, though.  If you can post what is involved in the "some other calculations" section of your macro, there may be another way to approach the problem.

Are we same in assuming that the first "end;" should be "%end;"

Good luck.

Tom
Super User Tom
Super User

You should explain why you feel a need to make 250,000 individual data sets. 

Why not just use BSTR as a BY variable in whatever processing you are doing?

I doubt that SAS would be able to generate 30,000 datasets in a single pass this way.

If you move the loop to outside the data step then it could work.  Create an index on BSTR for the input dataset to make it efficient.

%do i=1 %to 30000;

data inputdate&i;

   set inputdata;

   where BSTR=&i ;

run;

%end;

Lida
Obsidian | Level 7

Well, in this case I will read the file (1.5 Gb) in the set instruction 30000 times. Correct? I am trying to minimize I/O.

Haikuo
Onyx | Level 15

Well, if you want to stick to your original design, I think you already have a pretty good idea on the up-limit to your hardware. So you can definitely use that info to break up your one task into many and finish it up.

I am also curious about the reason you need so many tables. if you don't mind sharing the whole picture, maybe someone on the forum can figure out an alternative way to deal with it without producing 30,000 tables.

Regards,

Haikuo

Lida
Obsidian | Level 7

I am calculating a variance for life expectancy using the bootstrap methodology. BSTR is a bootstrap number. Later in the program I include another macro which builds a life table. That macro does not process data by By variable. Therefore I need to subset the original dataset and input the data into the submacro. At the end of the program I collect all 250000 values for life expectancy and calculate variance. The alternative way is to modify a submacro by adding a by instruction. I am trying to avoid this.

When my test code will stop running, I will send you the whole program. May be someone in the forum can suggest the alternative way.

Thank you!

Lida
Obsidian | Level 7

OK. This is what I am doing:

I am going to try Index this time...

Any other ideas? Any input is greatly appriciated. Thank you for your help!!

**************************************************;

* macro for LT calculations;

***************************************************;

%macro LE_HALE_estimation (input_data=,

b=, output_data=, mortrates=, HUI=);

proc sort data=&input_data;

by BSRP;

run;

%include '…\LifeTable20Plus_rates.sas';

PROC SQL;

     DROP TABLE &output_data;

   QUIT;

data %do i=1 %to &b; inputdata&i %end;;

set &input_data;

%do i=1 %to &b;

if BSRP=&i then output inputdata&i;

%end;

run;

%do i=1 %to &b;

%LifeTable20Plus_rates(

            input_mortality=inputdata&i,

        input_population=inputdata&i,

            output_liftable=&output_data._&i,

            chiang_ax=Chiang20PlusConstantsBySex (WHERE=(Sex IN ('F','M'))),

            age_group=AgeGroup,

            sex=Sex,

            mortrates=&mortrates,

            HUI=&HUI,

            print=);

data lifetable&i;

set &output_data._&i (keep=sex agegroup ex exa);

rename ex=ex&i exa=exa&i;

run;

proc sort data=lifetable&i;

by sex agegroup;

run;

PROC SQL;

     DROP TABLE &output_data._&i;

   QUIT;

%end;

data &output_data;

merge lifetable1-lifetable&b;

by sex agegroup;

run;

PROC DATASETS LIBRARY=WORK NOLIST NOWARN;

   DELETE lifetable1-lifetable&b;

quit;

data &output_data;

set &output_data;

LEmean=mean(of ex1-ex&b);

LEvar=var(of ex1-ex&b);

HALEmean=mean(of exa1-exa&b);

HALEvar=var(of exa1-exa&b);

run;

%mend LE_HALE_estimation;

/*generate b number of LTs*/

%LE_HALE_estimation (input_data=new.resultsForVar,

b=25000, output_data=new.LT_NormalWgt,mortrates=ratenormal,HUI=HUInormal);

Wneh I run this code for b=25---noproblem.

when b=25000 or 30000. I have an error; (I can see my log now)

ERROR: The SAS System stopped processing this step because of insufficient memory.
WARNING: The data set WORK.INPUTDATA1 may be incomplete.  When this step was stopped there were 0 observations and 28 variables.

_________________________________________________

Reeza
Super User

A suggestion:

Why not change the order of operations in your macro so that you create your subset, run the macro code and then do your other step rather than create and store all 30,000 in one go.

Another option is you've used a where clause in setting your data with the sex=M/F why not use that type of logic for your dataset with a where clause with a loop in a diff location.  Indexing the file will allow the where clause to be fairly efficient.

  input_mortality=&inputdata (where=BSRP=&i)),

        input_population=inputdata&i,

            output_liftable=&output_data._&i,

            chiang_ax=Chiang20PlusConstantsBySex (WHERE=(Sex IN ('F','M'))),

            age_group=AgeGroup,

            sex=Sex,

            mortrates=&mortrates,

            HUI=&HUI,

            print=

Lida
Obsidian | Level 7

It could work. Thanks. will try.

Astounding
PROC Star

Lida,

While all this is likely to work, you will still encounter similar problems in the bottom half of your program.  You will encounter resource issues when you try to merge 250,000 data sets.

A better approach would collect the output data sets vertically instead of horizontally.  For example:

data lifetable;

   set &output_table._&i (keep=sex agegroup ex exa);

   /* Extra, but possibly not needed:  BSTR = &i;  */

run;

proc append data=lifetable base=&output_data;

run;

/* PROC SQL as above, to drop &output_table._&i  */

%end;  /* All loop iterations are complete */

proc summary data=&output_data nway;

   var ex exa;

   class sex agegroup;

   output out=&output_data (drop=_freq_ _type_)

             mean = LEmean HALEmean

             var = LEvar HALEvar;

run;

It would be up to you if you wanted to replace &output_data with the summarized version, or if you wanted to keep all the detail from each bootstrap sample and use a separate data set to hold the summary.  But don't try to merge 250,000 data sets and be wary of creating a data set with 1M variables when you have an alternative.

It looks like you already have PROC SQL dropping any earlier version of &output_data, which is a good thing to do.

Good luck.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 20 replies
  • 1420 views
  • 6 likes
  • 6 in conversation