BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
daszlosek
Quartz | Level 8

Hello Everyone,

I am using a macro I created for proc surveyfreq and it is taking a lot of CPU time (2:57:30 per variable) in order for the program to run. I was wondering if there was a more efficient way to run the program? Or perhaps there is a way I could check to see how the data is structured and fix the issue that way? When I remove the weight, the process time speeds up quite a bit.

%macro tab1 (var);

proc surveyfreq data = ACE.ACEDATASET;

tables rucc* &var/ row col chisq;

strata _STSR;

weight _FINALWT2;

cluster _NEWPSU

run;

%tab1 (ACEDEPRS2);

%tab1 (ACEDIVRC);

%tab1 (ACEDRINK);

%tab1 (ACEDRUGS);

%tab1 (ACEHURT);

%tab1 (ACEHVSEX);

%tab1 (ACEPRISN);

%tab1 (ACEPUNCH);

%tab1 (ACESWEAR);

%tab1 (ACETOUCH);

%tab1 (ACETTHEM);


Thank you for your suggestions and solutions,


Donald S.

1 ACCEPTED SOLUTION

Accepted Solutions
daszlosek
Quartz | Level 8

Put Dataset on Hard drive and still have the same issue. I switched to using SAS callable SUDAAN and everything is running smoothly.

View solution in original post

9 REPLIES 9
Rick_SAS
SAS Super FREQ

RUCC and the other variables are cattegorical, right?  How many levels for each variable, and how many observations in the data set?  How many strata and PSUs?

daszlosek
Quartz | Level 8

Yes, all variables are categorical. RUCC has 4 levels and all the variables beginning with ACE are dichotomous. There are about 110,000 observations, PSU, and STRATA in the dataset.

ballardw
Super User

Since you don't appear to be creating files or anything need the variable name you should try using

tables rucc* (ACEDEPRS2 ACEDEPRS2 <list all the variables>) / row col chisq;

You call also have multiple tables statements in a single call to the Proc similar to Proc freq.

You spend a lot of time reloading all the data for each call.

I also wonder from some of your variables if this may be BRFSS related?

daszlosek
Quartz | Level 8

Wonderful, I will try that. Yes, I am working with BRFSS data, good eye ballardw!

I was also wondering how the loading is structured differently between running the data as a macro vs. listing the variables in parenthesis. With the parenthesis, are they all loaded at once and then a table is formed with the RUCC? And for the macro, does it run through and load each of the '&var' variables separately and this is what causes the longer load times?

Thank you,

Donald S.

ballardw
Super User

A separate call to a procedure means it has to "reread" the data. The dataset may be cached but new "bins" have to be created for the analysis. One call with multiple variables might take a bit longer than for a single variable but is much faster than rereading the data repeatedly.

If you have SAS 9.1 you may need to have multiple Table statements as there was a bug in some versions that complained about the var1 * (var2 var3) syntax.

I've been working with and around the BRFSS since late 1997 so I kind of recognize some of the variables.

I think that if your version supports it, the option NOMCAR is recommended as well.

daszlosek
Quartz | Level 8

Ballardw,

I am currently running your code for a single variable, just to see the process time and it has been about 20 minutes. Could there be anything else I could do?

proc surveyfreq data = ACE.ACEDATASET;

  tables rucc* ACEDEPRS2 / row col chisq nomcar;

  strata _STSTR;

  WEIGHT _FINALWT2;

  cluster _NEWPSU;

  run;

Thank you,

ballardw
Super User

Sometimes if you are building lots of tables the time actually ends up in the building of the HTML output but I think you have something else going on. NOMCAR belongs on the Proc statement not Tables. I would interrupt this if it is still running.

I haven't run surveyfreq or means on anything with more than about 15,000 records but with as many as 20-odd variable combinations in a single proc call and not had any of those run more than a couple of minutes. Are you working in a server environment by any chance? Network connections and/or server options may be an issue.

daszlosek
Quartz | Level 8

I am working off a network at the moment. I will give the dataset to my hard drive and see if that makes a difference. Just to be more exact there are 101886 observations, 101720 clusters, 262 strata, 166 number of observations with non positive weights and the sum of the weights is 25440337.9.

Message was edited by: Donald Szlosek

daszlosek
Quartz | Level 8

Put Dataset on Hard drive and still have the same issue. I switched to using SAS callable SUDAAN and everything is running smoothly.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 9 replies
  • 2235 views
  • 1 like
  • 3 in conversation