Solved
Contributor
Posts: 24

# Summarizing and Manipulating Data

Hi,

I'm trying to take data that's in this form:

County FIPSSurvey Question
220011 (yes)
220011 (yes)
220012 (no)
220011 (yes)
220039 (refused)
220031 (yes)
220031 (yes)
220051 (yes)
220052 (no)

And get summary statistics (percent that said no for each county) in this form:

County FIPSPercent No
2200125%
220030%
2200550%

I've tried using PROC FREQ in various ways to create different outputs, but I'm not getting this simple and straightforward output - County and % No.

I would greatly appreciate any guidance or suggestions.

Amanda

Sorry, error in original second table

Accepted Solutions
Solution
‎07-29-2015 03:31 PM
Valued Guide
Posts: 862

## Re: Summarizing and Manipulating Data

I don't come up with the same numbers that you do but here is a solution, there's probably a neater way to do this with proc summary but I'm not too experienced with it:

Data have;

infile cards dsd;

length County_FIPS \$5. Survey_Question \$11.;

input County_FIPS\$ Survey_Question\$;

cards;

22001,1 (yes)

22001,1 (yes)

22001,2 (no)

22001,1 (yes)

22003,9 (refused)

22003,1 (yes)

22003,1 (yes)

22005,1 (yes)

22005,2 (no)

;

proc sql;

create table want as

select county_fips,count(survey_question)/counts as percent_no

from (select county_fips,survey_question,count(survey_question) as counts

from have

group by county_fips)

where survey_question contains 'no'

group by county_fips;

If you want to count refused you can change 'contains 'no'' to 'not contains 'yes'' but it's 33%, not 67%.

Hope this helps.

Edit:

proc sql;

create table want as

select county_fips,count(survey_question)/counts as Percent_No format = percent.

from (select county_fips,survey_question,count(survey_question) as counts

from have

group by county_fips)

where survey_question not contains 'yes'

group by county_fips;

Message was edited by: Mark Johnson

All Replies
Solution
‎07-29-2015 03:31 PM
Valued Guide
Posts: 862

## Re: Summarizing and Manipulating Data

I don't come up with the same numbers that you do but here is a solution, there's probably a neater way to do this with proc summary but I'm not too experienced with it:

Data have;

infile cards dsd;

length County_FIPS \$5. Survey_Question \$11.;

input County_FIPS\$ Survey_Question\$;

cards;

22001,1 (yes)

22001,1 (yes)

22001,2 (no)

22001,1 (yes)

22003,9 (refused)

22003,1 (yes)

22003,1 (yes)

22005,1 (yes)

22005,2 (no)

;

proc sql;

create table want as

select county_fips,count(survey_question)/counts as percent_no

from (select county_fips,survey_question,count(survey_question) as counts

from have

group by county_fips)

where survey_question contains 'no'

group by county_fips;

If you want to count refused you can change 'contains 'no'' to 'not contains 'yes'' but it's 33%, not 67%.

Hope this helps.

Edit:

proc sql;

create table want as

select county_fips,count(survey_question)/counts as Percent_No format = percent.

from (select county_fips,survey_question,count(survey_question) as counts

from have

group by county_fips)

where survey_question not contains 'yes'

group by county_fips;

Message was edited by: Mark Johnson

Contributor
Posts: 24

## Re: Summarizing and Manipulating Data

Posted in reply to Steelers_In_DC

Thank you, Mark! Your code gave me exactly what I wanted! Perfect!

SAS Super FREQ
Posts: 3,837

## Re: Summarizing and Manipulating Data

I don't understand your results for FIPS=22003. Shouldn't it be 0%? Or do you want to count 9=refuced as a "no"?

Anyway:

1) sort the data by FIPS

2) Create a format which has the value "No" for the answers that you consider to be negative.

3) Use PROC FREQ to tabulate the binary formatted variable BY FIPS

4) Keep only the observations that have the formatted value "No"

Here's an example where I take the sashelp.cars data and make "No" mean that the cars do not have 6 cylinders:

proc sort data=sashelp.cars out=cars;
by origin;
run;

proc format;
value YESNO 6="Yes"
other="No ";
run;

proc freq data=cars noprint;
format Cylinders YESNO.;
by origin;
tables Cylinders / missing out=FreqOut;
run;

proc print data=FreqOut;
where put(cylinders,YESNO.)="No";
run;

Contributor
Posts: 24

## Re: Summarizing and Manipulating Data

Yes, I corrected my table. Would this give me percentages, not just counts? Mark's solution worked perfectly so I can use that. Thanks.

Super User
Posts: 11,793

## Re: Summarizing and Manipulating Data

This does raw count percentages:

proc freq data=have noprint;

tables fips*survey_question/ list out=want(where=(survey_question= "2 (no)"));

run;

proc print data=want noobs;

var  fips percent;

label fips='County FIPS'

percent = "Percent No"

;

run;

Or another way: When interested in a specific level of a variable you can create a dichotomous 1/0 for yes/no values and summarize that.

One question: are refused to be considered in denominator or not?

If you want refused (or don't know or other possible not applicable responses) in denominator:

If not missing(survey_question) then SQ_d = (survey_question = "2 (no)");

The not missing is to avoid creating a record with a code when the question has no response.

To exclude the refused or others, something like:

If not missing(survey_question) and survey_question ne "9 (refused)"  then SQ_d = (survey_question = "2 (no)");

The mean of the SQ_D variable would be a percent No for those answered in decimal.

There are questions then:

Are there weights to be applied? Then you need to apply the weights using an appropriate procedure with a weight statement. Proc means or summary would work with Fips as a class variable.

And if so are they complex? Now you need to move to Proc SurveyFreq or Surveymeans.

May I say that whoever is providing data as 1 (yes) isn't doing any favors? A value of 1,2,9 and knowledge of the coding works well. Values of "yes", "no", "refused" works well. Both adds way to much text and likelihood of coding issues with extra () and getting quotes in the right spots.

Super User
Posts: 20,730

## Re: Summarizing and Manipulating Data

That seems like a straight proc freq unless I'm missing something:

Data have;

infile cards dsd;

length County_FIPS \$5. Survey_Question \$11.;

input County_FIPS\$ Survey_Question\$;

cards;

22001,1 (yes)

22001,1 (yes)

22001,2 (no)

22001,1 (yes)

22003,9 (refused)

22003,1 (yes)

22003,1 (yes)

22005,1 (yes)

22005,2 (no)

;

proc freq data=have;

where Survey_Question ne '9 (refused)';

table county_fips*survey_question/list outpct out=want(where=(survey_question='2 (no)')) sparse;

run;

proc print data= want;

var county_fips pct_row;

run;

Super User
Posts: 10,205

## Re: Summarizing and Manipulating Data

### Code: Program

`Data have;infile cards dsd;length County_FIPS \$5. Survey_Question \$11.;input County_FIPS\$ Survey_Question\$;cards;22001,1 (yes)22001,1 (yes)22001,2 (no)22001,1 (yes)22003,9 (refused)22003,1 (yes)22003,1 (yes)22005,1 (yes)22005,2 (no);proc sql;select County_FIPS,sum(Survey_Question contains 'no')/count(*) as per format=percent8.2 from have  group by County_FIPS;quit;`

### Log: Program

Notes (3)

1 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;

53

54 Data have;

55 infile cards dsd;

56 length County_FIPS \$5. Survey_Question \$11.;

57 input County_FIPS\$ Survey_Question\$;

58 cards;

NOTE: The data set WORK.HAVE has 9 observations and 2 variables.

NOTE: DATA statement used (Total process time):

real time 0.00 seconds

cpu time 0.00 seconds

68 ;

69 proc sql;

70 select County_FIPS,sum(Survey_Question contains 'no')/count(*) as per format=percent8.2

71 from have

72 group by County_FIPS;

73 quit;

NOTE: PROCEDURE SQL used (Total process time):

real time 0.02 seconds

cpu time 0.03 seconds

74

75 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;

85

### Results: Program

 22001 25.00% 22003 0.00% 22005 50.00%

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
• 7 replies
• 450 views
• 2 likes
• 6 in conversation