Hi,
I'm trying to take data that's in this form:
County FIPS | Survey Question |
---|---|
22001 | 1 (yes) |
22001 | 1 (yes) |
22001 | 2 (no) |
22001 | 1 (yes) |
22003 | 9 (refused) |
22003 | 1 (yes) |
22003 | 1 (yes) |
22005 | 1 (yes) |
22005 | 2 (no) |
And get summary statistics (percent that said no for each county) in this form:
County FIPS | Percent No |
---|---|
22001 | 25% |
22003 | 0% |
22005 | 50% |
I've tried using PROC FREQ in various ways to create different outputs, but I'm not getting this simple and straightforward output - County and % No.
I would greatly appreciate any guidance or suggestions.
Amanda
Sorry, error in original second table
I don't come up with the same numbers that you do but here is a solution, there's probably a neater way to do this with proc summary but I'm not too experienced with it:
Data have;
infile cards dsd;
length County_FIPS $5. Survey_Question $11.;
input County_FIPS$ Survey_Question$;
cards;
22001,1 (yes)
22001,1 (yes)
22001,2 (no)
22001,1 (yes)
22003,9 (refused)
22003,1 (yes)
22003,1 (yes)
22005,1 (yes)
22005,2 (no)
;
proc sql;
create table want as
select county_fips,count(survey_question)/counts as percent_no
from (select county_fips,survey_question,count(survey_question) as counts
from have
group by county_fips)
where survey_question contains 'no'
group by county_fips;
If you want to count refused you can change 'contains 'no'' to 'not contains 'yes'' but it's 33%, not 67%.
Hope this helps.
Edit:
proc sql;
create table want as
select county_fips,count(survey_question)/counts as Percent_No format = percent.
from (select county_fips,survey_question,count(survey_question) as counts
from have
group by county_fips)
where survey_question not contains 'yes'
group by county_fips;
Message was edited by: Mark Johnson
I don't come up with the same numbers that you do but here is a solution, there's probably a neater way to do this with proc summary but I'm not too experienced with it:
Data have;
infile cards dsd;
length County_FIPS $5. Survey_Question $11.;
input County_FIPS$ Survey_Question$;
cards;
22001,1 (yes)
22001,1 (yes)
22001,2 (no)
22001,1 (yes)
22003,9 (refused)
22003,1 (yes)
22003,1 (yes)
22005,1 (yes)
22005,2 (no)
;
proc sql;
create table want as
select county_fips,count(survey_question)/counts as percent_no
from (select county_fips,survey_question,count(survey_question) as counts
from have
group by county_fips)
where survey_question contains 'no'
group by county_fips;
If you want to count refused you can change 'contains 'no'' to 'not contains 'yes'' but it's 33%, not 67%.
Hope this helps.
Edit:
proc sql;
create table want as
select county_fips,count(survey_question)/counts as Percent_No format = percent.
from (select county_fips,survey_question,count(survey_question) as counts
from have
group by county_fips)
where survey_question not contains 'yes'
group by county_fips;
Message was edited by: Mark Johnson
Thank you, Mark! Your code gave me exactly what I wanted! Perfect!
I don't understand your results for FIPS=22003. Shouldn't it be 0%? Or do you want to count 9=refuced as a "no"?
Anyway:
1) sort the data by FIPS
2) Create a format which has the value "No" for the answers that you consider to be negative.
3) Use PROC FREQ to tabulate the binary formatted variable BY FIPS
4) Keep only the observations that have the formatted value "No"
Here's an example where I take the sashelp.cars data and make "No" mean that the cars do not have 6 cylinders:
proc sort data=sashelp.cars out=cars;
by origin;
run;
proc format;
value YESNO 6="Yes"
other="No ";
run;
proc freq data=cars noprint;
format Cylinders YESNO.;
by origin;
tables Cylinders / missing out=FreqOut;
run;
proc print data=FreqOut;
where put(cylinders,YESNO.)="No";
run;
Yes, I corrected my table. Would this give me percentages, not just counts? Mark's solution worked perfectly so I can use that. Thanks.
This does raw count percentages:
proc freq data=have noprint;
tables fips*survey_question/ list out=want(where=(survey_question= "2 (no)"));
run;
proc print data=want noobs;
var fips percent;
label fips='County FIPS'
percent = "Percent No"
;
run;
Or another way: When interested in a specific level of a variable you can create a dichotomous 1/0 for yes/no values and summarize that.
One question: are refused to be considered in denominator or not?
If you want refused (or don't know or other possible not applicable responses) in denominator:
If not missing(survey_question) then SQ_d = (survey_question = "2 (no)");
The not missing is to avoid creating a record with a code when the question has no response.
To exclude the refused or others, something like:
If not missing(survey_question) and survey_question ne "9 (refused)" then SQ_d = (survey_question = "2 (no)");
The mean of the SQ_D variable would be a percent No for those answered in decimal.
There are questions then:
Are there weights to be applied? Then you need to apply the weights using an appropriate procedure with a weight statement. Proc means or summary would work with Fips as a class variable.
And if so are they complex? Now you need to move to Proc SurveyFreq or Surveymeans.
May I say that whoever is providing data as 1 (yes) isn't doing any favors? A value of 1,2,9 and knowledge of the coding works well. Values of "yes", "no", "refused" works well. Both adds way to much text and likelihood of coding issues with extra () and getting quotes in the right spots.
That seems like a straight proc freq unless I'm missing something:
Data have;
infile cards dsd;
length County_FIPS $5. Survey_Question $11.;
input County_FIPS$ Survey_Question$;
cards;
22001,1 (yes)
22001,1 (yes)
22001,2 (no)
22001,1 (yes)
22003,9 (refused)
22003,1 (yes)
22003,1 (yes)
22005,1 (yes)
22005,2 (no)
;
proc freq data=have;
where Survey_Question ne '9 (refused)';
table county_fips*survey_question/list outpct out=want(where=(survey_question='2 (no)')) sparse;
run;
proc print data= want;
var county_fips pct_row;
run;
Data have;
infile cards dsd;
length County_FIPS $5. Survey_Question $11.;
input County_FIPS$ Survey_Question$;
cards;
22001,1 (yes)
22001,1 (yes)
22001,2 (no)
22001,1 (yes)
22003,9 (refused)
22003,1 (yes)
22003,1 (yes)
22005,1 (yes)
22005,2 (no)
;
proc sql;
select County_FIPS,sum(Survey_Question contains 'no')/count(*) as per format=percent8.2
from have
group by County_FIPS;
quit;
Notes (3)
53
54 Data have;
55 infile cards dsd;
56 length County_FIPS $5. Survey_Question $11.;
57 input County_FIPS$ Survey_Question$;
58 cards;
NOTE: The data set WORK.HAVE has 9 observations and 2 variables.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
68 ;
69 proc sql;
70 select County_FIPS,sum(Survey_Question contains 'no')/count(*) as per format=percent8.2
71 from have
72 group by County_FIPS;
73 quit;
NOTE: PROCEDURE SQL used (Total process time):
real time 0.02 seconds
cpu time 0.03 seconds
74
75 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
85
22001 | 25.00% |
22003 | 0.00% |
22005 | 50.00% |
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.