Hello Everyone,
I am using a macro I created for proc surveyfreq and it is taking a lot of CPU time (2:57:30 per variable) in order for the program to run. I was wondering if there was a more efficient way to run the program? Or perhaps there is a way I could check to see how the data is structured and fix the issue that way? When I remove the weight, the process time speeds up quite a bit.
%macro tab1 (var);
proc surveyfreq data = ACE.ACEDATASET;
tables rucc* &var/ row col chisq;
strata _STSR;
weight _FINALWT2;
cluster _NEWPSU
run;
%tab1 (ACEDEPRS2);
%tab1 (ACEDIVRC);
%tab1 (ACEDRINK);
%tab1 (ACEDRUGS);
%tab1 (ACEHURT);
%tab1 (ACEHVSEX);
%tab1 (ACEPRISN);
%tab1 (ACEPUNCH);
%tab1 (ACESWEAR);
%tab1 (ACETOUCH);
%tab1 (ACETTHEM);
Thank you for your suggestions and solutions,
Donald S.
Put Dataset on Hard drive and still have the same issue. I switched to using SAS callable SUDAAN and everything is running smoothly.
RUCC and the other variables are cattegorical, right? How many levels for each variable, and how many observations in the data set? How many strata and PSUs?
Yes, all variables are categorical. RUCC has 4 levels and all the variables beginning with ACE are dichotomous. There are about 110,000 observations, PSU, and STRATA in the dataset.
Since you don't appear to be creating files or anything need the variable name you should try using
tables rucc* (ACEDEPRS2 ACEDEPRS2 <list all the variables>) / row col chisq;
You call also have multiple tables statements in a single call to the Proc similar to Proc freq.
You spend a lot of time reloading all the data for each call.
I also wonder from some of your variables if this may be BRFSS related?
Wonderful, I will try that. Yes, I am working with BRFSS data, good eye ballardw!
I was also wondering how the loading is structured differently between running the data as a macro vs. listing the variables in parenthesis. With the parenthesis, are they all loaded at once and then a table is formed with the RUCC? And for the macro, does it run through and load each of the '&var' variables separately and this is what causes the longer load times?
Thank you,
Donald S.
A separate call to a procedure means it has to "reread" the data. The dataset may be cached but new "bins" have to be created for the analysis. One call with multiple variables might take a bit longer than for a single variable but is much faster than rereading the data repeatedly.
If you have SAS 9.1 you may need to have multiple Table statements as there was a bug in some versions that complained about the var1 * (var2 var3) syntax.
I've been working with and around the BRFSS since late 1997 so I kind of recognize some of the variables.
I think that if your version supports it, the option NOMCAR is recommended as well.
Ballardw,
I am currently running your code for a single variable, just to see the process time and it has been about 20 minutes. Could there be anything else I could do?
proc surveyfreq data = ACE.ACEDATASET;
tables rucc* ACEDEPRS2 / row col chisq nomcar;
strata _STSTR;
WEIGHT _FINALWT2;
cluster _NEWPSU;
run;
Thank you,
Sometimes if you are building lots of tables the time actually ends up in the building of the HTML output but I think you have something else going on. NOMCAR belongs on the Proc statement not Tables. I would interrupt this if it is still running.
I haven't run surveyfreq or means on anything with more than about 15,000 records but with as many as 20-odd variable combinations in a single proc call and not had any of those run more than a couple of minutes. Are you working in a server environment by any chance? Network connections and/or server options may be an issue.
I am working off a network at the moment. I will give the dataset to my hard drive and see if that makes a difference. Just to be more exact there are 101886 observations, 101720 clusters, 262 strata, 166 number of observations with non positive weights and the sum of the weights is 25440337.9.
Message was edited by: Donald Szlosek
Put Dataset on Hard drive and still have the same issue. I switched to using SAS callable SUDAAN and everything is running smoothly.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.