Solved: Re: Significant CPU time required for Proc Surveyfreq

daszlosek · Posted 07-15-2015 10:42 AM

Hello Everyone,

I am using a macro I created for proc surveyfreq and it is taking a lot of CPU time (2:57:30 per variable) in order for the program to run. I was wondering if there was a more efficient way to run the program? Or perhaps there is a way I could check to see how the data is structured and fix the issue that way? When I remove the weight, the process time speeds up quite a bit.

%macro tab1 (var);

proc surveyfreq data = ACE.ACEDATASET;

tables rucc* &var/ row col chisq;

strata _STSR;

weight _FINALWT2;

cluster _NEWPSU

run;

%tab1 (ACEDEPRS2);

%tab1 (ACEDIVRC);

%tab1 (ACEDRINK);

%tab1 (ACEDRUGS);

%tab1 (ACEHURT);

%tab1 (ACEHVSEX);

%tab1 (ACEPRISN);

%tab1 (ACEPUNCH);

%tab1 (ACESWEAR);

%tab1 (ACETOUCH);

%tab1 (ACETTHEM);

Thank you for your suggestions and solutions,

Donald S.

daszlosek · Posted 07-20-2015 02:29 PM

Put Dataset on Hard drive and still have the same issue. I switched to using SAS callable SUDAAN and everything is running smoothly.

View solution in original post

Rick_SAS · Posted 07-15-2015 10:48 AM

RUCC and the other variables are cattegorical, right? How many levels for each variable, and how many observations in the data set? How many strata and PSUs?

daszlosek · Posted 07-15-2015 10:59 AM

Yes, all variables are categorical. RUCC has 4 levels and all the variables beginning with ACE are dichotomous. There are about 110,000 observations, PSU, and STRATA in the dataset.

ballardw · Posted 07-15-2015 11:17 AM

Since you don't appear to be creating files or anything need the variable name you should try using

tables rucc* (ACEDEPRS2 ACEDEPRS2 <list all the variables>) / row col chisq;

You call also have multiple tables statements in a single call to the Proc similar to Proc freq.

You spend a lot of time reloading all the data for each call.

I also wonder from some of your variables if this may be BRFSS related?

daszlosek · Posted 07-15-2015 11:25 AM

Wonderful, I will try that. Yes, I am working with BRFSS data, good eye ballardw!

I was also wondering how the loading is structured differently between running the data as a macro vs. listing the variables in parenthesis. With the parenthesis, are they all loaded at once and then a table is formed with the RUCC? And for the macro, does it run through and load each of the '&var' variables separately and this is what causes the longer load times?

Thank you,

Donald S.

ballardw · Posted 07-15-2015 11:45 AM

A separate call to a procedure means it has to "reread" the data. The dataset may be cached but new "bins" have to be created for the analysis. One call with multiple variables might take a bit longer than for a single variable but is much faster than rereading the data repeatedly.

If you have SAS 9.1 you may need to have multiple Table statements as there was a bug in some versions that complained about the var1 * (var2 var3) syntax.

I've been working with and around the BRFSS since late 1997 so I kind of recognize some of the variables.

I think that if your version supports it, the option NOMCAR is recommended as well.

daszlosek · Posted 07-15-2015 12:23 PM

Ballardw,

I am currently running your code for a single variable, just to see the process time and it has been about 20 minutes. Could there be anything else I could do?

proc surveyfreq data = ACE.ACEDATASET;

tables rucc* ACEDEPRS2 / row col chisq nomcar;

strata _STSTR;

WEIGHT _FINALWT2;

cluster _NEWPSU;

run;

Thank you,

ballardw · Posted 07-15-2015 01:34 PM

Sometimes if you are building lots of tables the time actually ends up in the building of the HTML output but I think you have something else going on. NOMCAR belongs on the Proc statement not Tables. I would interrupt this if it is still running.

I haven't run surveyfreq or means on anything with more than about 15,000 records but with as many as 20-odd variable combinations in a single proc call and not had any of those run more than a couple of minutes. Are you working in a server environment by any chance? Network connections and/or server options may be an issue.

daszlosek · Posted 07-15-2015 01:53 PM

I am working off a network at the moment. I will give the dataset to my hard drive and see if that makes a difference. Just to be more exact there are 101886 observations, 101720 clusters, 262 strata, 166 number of observations with non positive weights and the sum of the weights is 25440337.9.

Message was edited by: Donald Szlosek

daszlosek · Posted 07-20-2015 02:29 PM

Put Dataset on Hard drive and still have the same issue. I switched to using SAS callable SUDAAN and everything is running smoothly.

Catch up on SAS Innovate 2026