08-11-2015 10:35 AM
I'm trying to get the percentages of women that had a screening mammogram by county using Behavioral Risk Factor Surveillance System (BRFSS, a large survey performed by the Centers for Disease Control and Prevention) data. This is the code I'm trying to use:
proc surveyfreq data=tmp;
I'm confident the strata, cluster, and weight statements are correct. The issue seems to be in the tables statement, specifically with using the county in the cross tabulation. "fips" refers to the county code (e.g. 22001, 22003, etc. - there's 64 different counties in the state of interest), and "mam" refers to if the woman had a mammogram or not (1=yes, 2=no, 9=unknown).
Can someone tell me why I can't seem to run this code (it just keeps running)? And how I could do this analysis?
08-11-2015 11:05 AM
How long is "just keeps running"?
number of records in the data set?
With my data, Idaho, for 44 counties and roughly 6000 records in a single year that takes about 3 minutes to run without the OR option. So expect it to take a bit longer.
You might check but I believe the NOMCAR option is currently preferred for use with BRFSS analysis.
Note that CDC has not provided a standard method for doing small area estimates, i.e. county, in general from BRFSS data. The results are generally best restricted for use at the level the sample was stratified.
08-11-2015 02:22 PM
After I left it fun for over an hour, I stopped it because I assumed something was wrong.
There are 9,068 records in the dataset.
It seems based on how quit your data ran, I must not be doing something correct.
I read I should use NOMCAR to get better SEs or CIs (can't remember which).
08-11-2015 03:26 PM
It might be informative to see just how long it takes to copy to your computer. I've had days where copying a 2mb file on our network took close to an hour. Running any analysis in that environment would have been practically impossible.
08-14-2015 11:18 AM
My office is waiting, not quite with bated breath due to duration expectations, for CDC to finish their standardized approach to dealing with county / small area estimates from BRFSS data. The bits we've seen so far do not make me think that the time or code will be as simple as a direct estimate such as you just performed. It may be time to make noises about improving hardware, possibly faster disks, more memory and spend some time optimizing SAS options for through put.
Another option is SAS-callable SUDAAN may run a bit faster. At least it may be worth a test comparison.
Need further help from the community? Please ask a new question.