Solved: PROC SURVEYFREQ Issue

acros · Posted 08-11-2015 10:35 AM

Hi,

I'm trying to get the percentages of women that had a screening mammogram by county using Behavioral Risk Factor Surveillance System (BRFSS, a large survey performed by the Centers for Disease Control and Prevention) data. This is the code I'm trying to use:

proc surveyfreq data=tmp;

strata _ststr;

cluster _psu;

weight _llcpwt;

tables fips*mam/or;

run;

I'm confident the strata, cluster, and weight statements are correct. The issue seems to be in the tables statement, specifically with using the county in the cross tabulation. "fips" refers to the county code (e.g. 22001, 22003, etc. - there's 64 different counties in the state of interest), and "mam" refers to if the woman had a mammogram or not (1=yes, 2=no, 9=unknown).

Can someone tell me why I can't seem to run this code (it just keeps running)? And how I could do this analysis?

Amanda

acros · Posted 08-14-2015 09:29 AM

It just took way longer than I expected. I let if run all night and it worked.

View solution in original post

ballardw · Posted 08-11-2015 11:05 AM

How long is "just keeps running"?

number of records in the data set?

With my data, Idaho, for 44 counties and roughly 6000 records in a single year that takes about 3 minutes to run without the OR option. So expect it to take a bit longer.

You might check but I believe the NOMCAR option is currently preferred for use with BRFSS analysis.

Note that CDC has not provided a standard method for doing small area estimates, i.e. county, in general from BRFSS data. The results are generally best restricted for use at the level the sample was stratified.

acros · Posted 08-11-2015 02:22 PM

After I left it fun for over an hour, I stopped it because I assumed something was wrong.

There are 9,068 records in the dataset.

It seems based on how quit your data ran, I must not be doing something correct.

I read I should use NOMCAR to get better SEs or CIs (can't remember which).

ballardw · Posted 08-11-2015 03:11 PM

Try the code without OR. Also, is the data local to your machine or on a network? Sometimes local data runs faster.

acros · Posted 08-11-2015 03:17 PM

Will do. I'm going to run it overnight and see what happens. It's on a network. I can move it to my computer.

ballardw · Posted 08-11-2015 03:26 PM

It might be informative to see just how long it takes to copy to your computer. I've had days where copying a 2mb file on our network took close to an hour. Running any analysis in that environment would have been practically impossible.

acros · Posted 08-11-2015 03:54 PM

Thanks for the warning!

acros · Posted 08-14-2015 09:29 AM

It just took way longer than I expected. I let if run all night and it worked.

ballardw · Posted 08-14-2015 11:18 AM

My office is waiting, not quite with bated breath due to duration expectations, for CDC to finish their standardized approach to dealing with county / small area estimates from BRFSS data. The bits we've seen so far do not make me think that the time or code will be as simple as a direct estimate such as you just performed. It may be time to make noises about improving hardware, possibly faster disks, more memory and spend some time optimizing SAS options for through put.

Another option is SAS-callable SUDAAN may run a bit faster. At least it may be worth a test comparison.

PROC SURVEYFREQ Issue

Re: PROC SURVEYFREQ Issue

Re: PROC SURVEYFREQ Issue

Re: PROC SURVEYFREQ Issue

Re: PROC SURVEYFREQ Issue

Re: PROC SURVEYFREQ Issue

Re: PROC SURVEYFREQ Issue

Re: PROC SURVEYFREQ Issue

Re: PROC SURVEYFREQ Issue

Re: PROC SURVEYFREQ Issue

SAS Innovate 2025: Call for Content

Classroom Training Available!