Best way to create a Freq table from open ended Survey Data?

Reply
Occasional Contributor
Posts: 9

Best way to create a Freq table from open ended Survey Data?

Hello,

I am trying to make a simple frequency table based on a number of surveys I have collected. The problem I am facing is the fact that some of the questions are yes or no answers, and some questions have multiple values. I am importing these surveys from a Microsoft Access database. I am a beginner at SAS so I any input would be greatly appreciated. Currently I have many separate frequency tables based on different variables; however, I would like to have one simple table which reads the subject number on the y axis, then goes through the frequency of each answer. Thank you very much for your help!

Super Contributor
Posts: 644

Re: Best way to create a Freq table from open ended Survey Data?

Posted in reply to tkallday33

You need to tell us more about how your data is structured.  If it can simply be reduced to 3 columns eg

     Interviewee_Id     Question     Response

Then you can use Proc SQL, or Proc Freq with the /List option to restrict the output table to list format.

Proc Freq      Data = have ;

     Table      Question * Response / List  Out = want;

Run ;

Richard

Super Contributor
Posts: 644

Re: Best way to create a Freq table from open ended Survey Data?

Posted in reply to RichardinOz

On the other hand if your data has a separate column for each question then you need to transpose the data first  (prior Sort by id might be required) .  The following assumes all the responses are character.

Proc Sort data = FromAccess ;

     By     Interviewee_Id ;

Run ;

Proc Transpose Data = FromAccess

                         Out = have (rename = (Col1 = Response

                                                            _Name_ = Question

                                                            ))

                         ;

     By     Interviewee_Id ;

     Var     _Character_ ;

Run ;

In the following Proc Freq you may need to add as line 2

     Where Upcase (Question) NE "INTERVIEWEE_ID" ;

Richard

Occasional Contributor
Posts: 9

Re: Best way to create a Freq table from open ended Survey Data?

Posted in reply to RichardinOz

My data is structured in a table where as you move down, each row or observation is a different subject, and as you move across, each column or variable represents a different question. There are many questions and many columns. The answers to these questions are sometimes either a 1 or a 2 which corresponds to either a Yes or a No. I am still confused as to how to best set up my data. The final product I would like is a table that has all the different questions, the different answers, and how frequently each answer was submitted. Thank you so much for your response already! I am surprised by how supportive the community is. 

Super User
Posts: 11,343

Re: Best way to create a Freq table from open ended Survey Data?

Posted in reply to tkallday33

General steps with survey data:

Read your data into SAS.

If your data is collected with create formats so you can create more meaningful tables. Your example of 1 and 2 as Yes/No is typical, if the values are numeric create a numeric format such as this for categorical variables (the ones with select answers from a list of values).

proc format;

     value yn

     1= 'Yes'

     2= 'No'

     /* if you had a code value for not answered, refused, or missing due to a skip add them here. Note: Cut and paste from a questionnaire can be very helpful at this stage*/

     ;

run;

Then either associate the format permanently with the variables by a format or attrib statement when reading or recoding the data or at the time a proc is used. Assign the same format for identically coded variables. Generally I find that it may take a time or two to get the formats right and I re-read the data to get them assigned.

Then start cleaning the data looking for unexpected values. Proc freq will do this very quickly at a question by question basis. If you get unexpected values for a formatted variable, such as Yes, No and 99 that tells you either your format was incomplete or there is a data entry issue of some type though some can be generated by survey software to mark certain conditions. You may want to recode new variables based on various criteria or delete some records if the data is missing critical information or impute values for missing data but that's a bit beyond the scope of this question.

A simple

Proc Freq data=yourdatasetnamehere;

run;

will give you tables of each individual question for examination. Note that if you have things that are supposed to be one unique value per respondent like ID,phonenumber or the like you will get long tables.

AFTER cleaning data and creating any additional analysis variables and getting the response weights, then it is time to worry about presentation.

You actually probably don't want a single table unless the number of variables involved is very small as the table output can get very large quickly. People really don't want to see a table with 2000 rows besides presentation gets ugly. Grouping similar topics may work to combine in a table. Either Proc Report or Tabulate will allow summaries of many variables in a single procedure call and a number of appearance options to create usable tables.

Big caution: if your survey used a complex sample, basically anything other than simple random sample, design you will not correct variance/ standard deviations on percentages out of the basic procs such as Freq, Report or Tabulate using the weight options but will need to use Surveyfreq or  Surveymeans to summarize the data and then use a report procedure to display results.

And at this point, if you have more questions about the actual "table" output you need to show us what you actually want it to look like.

Super User
Posts: 11,343

Re: Best way to create a Freq table from open ended Survey Data?

Posted in reply to tkallday33

If you want one table with mutliple questions you might look into proc tabulate;

Stub of a program;

proc tabulate data=yourdatasetname here

     class <list the variables you want to summarize here separated by spaces> / missing;

     table <list the variables again>,

               n

               / misstext=' ';

run;

I recommend having formats for categorical variables to show a meaning full response if you get your data in code values where 1 means one thing for some questions and something else for others.

Note: pseudo continuous variables like age, height, weight may generate LOTS of lines. Grouping formats can create useful categories.

Ask a Question
Discussion stats
  • 5 replies
  • 540 views
  • 4 likes
  • 3 in conversation