BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
sav912
Calcite | Level 5
I have a survey that has 204 items/variables each rated from 0-4 or 99 (unlcear). I need to know what percentage of people rated each item each number (0-4, 99). Obviously I can run proc freq and get individual tables for each of the 204 variables, but I want to compile the frequency counts into a new dataset. I have tried using output, but it only uses the last variable.
 
Is there a way to create a dataset from the frequency tables where all of the items are observations and the Likert ratings are variables? And then each cell contains the frequency/percentage of each rating? I would also like to keep the missing values and have 0's inserted where a possible rating was not endorsed.
 
For example I currently have 97 participants and 204 items structured as such:
 
Obs     Item1      Item2     Item3    Item4
1            0              .            2           99
2            4              1            2            1
3            0              2            1            2
 
I want a dataset structured like below (even better if it were percentages instead of counts):
 
Obs      Freq0     Freq1     Freq2     Freq3    Freq4      Freq99      Missing
Item1      2              0           0             0             1              0                0
Item2      0              1           1             0             0              0                1
Item3      0              1           2             0             0              0                0
Item4      0              1           1             0             0              1                0
 
Any ideas are appreciated!
 
1 ACCEPTED SOLUTION

Accepted Solutions
PeterClemmensen
Tourmaline | Level 20

There are a few things to consider and a few hoops you have to jump to get your exact desired result. For instance, it seems you want the Freq3 column even though you do not have any values of three in your sample data.

 

This should give you what you want, both in a report and in a SAS Data set.

 

data have;
input Item1 - Item4;
datalines;
0 . 2 99 
4 1 2 1  
0 2 1 2  
;

proc format;
   value fmt (notsorted)
   0  = 'Freq0  '
   1  = 'Freq1  '
   2  = 'Freq2  '
   3  = 'Freq3  '
   4  = 'Freq4  '
   99 = 'Freq99 '
   .  = 'Missing'
   ;
run;

data temp;
   set have;
   array i Item:;
   do over i;
      item = vname(i);
      val  = i;
      output;
   end;
run;

ods output table=table;
proc tabulate data = temp missing;
   class item val / preloadfmt order = data;
   tables item='', val=''*n='' / printmiss;
   format val fmt.;
run;

data table;
   set table;
   v = put(val, fmt.);
run;

proc transpose data = table out = want(drop = _:);
   by item;
   id v;
   var N;
run;

 

Result

 

item  Freq0 Freq1 Freq2 Freq3 Freq4 Freq99 Missing
Item1 2     .     .     .     1     .      .
Item2 .     1     1     .     .     .      1
Item3 .     1     2     .     .     .      .
Item4 .     1     1     .     .     1      .

View solution in original post

8 REPLIES 8
PaigeMiller
Diamond | Level 26

First, this is a terrible structure for data. A better structure for this type of data (and almost any type of data) is:

 

data have;
    input item freq;
    cards;
1 0
2 .
3 2
4 99
1 4
2 1
;

This is a partial data set, obviously it would be much longer. 

 

Then, the table you want is trivial

 

proc freq data=have;
    table item*freq;
run;

 

Message here is to avoid "Excel thinking" and Excel data structure and try to work in data structures that fit into SAS PROCs.

 

To turn your data into the better structure, here is untested code

 

data re_arrange;
    set original_data;
    array itemnum item:;
    do i=1 to dim(itemnum);
        item=i;
        freq=itemnum(i);
        output;
     end;
     keep item freq;
run;

 

--
Paige Miller
sav912
Calcite | Level 5

Thank you for the response. I have never seen survey data structured that way- Can you explain why I should restructure it in that way? I will eventually have 300 participants answering 204 items, so will I end up with 61,200 rows?  

PeterClemmensen
Tourmaline | Level 20

There are a few things to consider and a few hoops you have to jump to get your exact desired result. For instance, it seems you want the Freq3 column even though you do not have any values of three in your sample data.

 

This should give you what you want, both in a report and in a SAS Data set.

 

data have;
input Item1 - Item4;
datalines;
0 . 2 99 
4 1 2 1  
0 2 1 2  
;

proc format;
   value fmt (notsorted)
   0  = 'Freq0  '
   1  = 'Freq1  '
   2  = 'Freq2  '
   3  = 'Freq3  '
   4  = 'Freq4  '
   99 = 'Freq99 '
   .  = 'Missing'
   ;
run;

data temp;
   set have;
   array i Item:;
   do over i;
      item = vname(i);
      val  = i;
      output;
   end;
run;

ods output table=table;
proc tabulate data = temp missing;
   class item val / preloadfmt order = data;
   tables item='', val=''*n='' / printmiss;
   format val fmt.;
run;

data table;
   set table;
   v = put(val, fmt.);
run;

proc transpose data = table out = want(drop = _:);
   by item;
   id v;
   var N;
run;

 

Result

 

item  Freq0 Freq1 Freq2 Freq3 Freq4 Freq99 Missing
Item1 2     .     .     .     1     .      .
Item2 .     1     1     .     .     .      1
Item3 .     1     2     .     .     .      .
Item4 .     1     1     .     .     1      .
PaigeMiller
Diamond | Level 26

@sav912 wrote:

Thank you for the response. I have never seen survey data structured that way- Can you explain why I should restructure it in that way? I will eventually have 300 participants answering 204 items, so will I end up with 61,200 rows?  


Because (almost?) all SAS PROCs are designed to work on long data sets rather than wide data sets. If your data is structured this way, the desired analysis is usually simple. If you work with the wide data set structure, it requires a lot more work to get the desired analysis; and for some PROCs the only way way to get the desired result is with long rather than wide data sets.

 

As I said, its trivial to get the report you want with a long data set. With a wide data set, you have to do a lot more work, as I have shown and as @PeterClemmensen has shown. You need to avoid "Excel thinking" and Excel data structures in SAS.

--
Paige Miller
sav912
Calcite | Level 5
Thank you for clarifying.
Tom
Super User Tom
Super User

What do you mean by SURVEY data?

Are ITEM1 to ITEM4 the QUESTIONS on the survey?  
Are OBS 1 to 3 the RESPONDENTS to the survey?

Are the cells in the actual response to the questions?  Do all of the questions have the same set of responses?  Like "Rated on a scale from 1 to 10 ...."  Or are they different (education level, income, ...)

If later then what the heck does the output table represent?

If former then as others have suggested TRANSPOSE the data and generate the counts from that.

proc format;
invalue miss99f 
  '99' = .N 
  other = [32.]
;
run;

data have;
  input ID (Item1-Item4) (:miss99f.);
cards;
1 0 . 2 99
2 4 1 2 1
3 0 2 1 2
; 

proc transpose data=have name=question out=tall(rename=(col1=answer));
  by id;
  var item1-item4;
run;

proc freq data=tall;
  tables question*answer / missing ;
run;

Tom_1-1668739648332.pngTom_2-1668739667477.png

 

Tom_0-1668739623449.png

 

sav912
Calcite | Level 5
There is no need to be rude. Each observation is a participant. Each item
is a different word. They are asked the same question about each word (how
often did you experience this in the past month) and are rated on the same
scale. As to “what the heck” my output will be used for: I️ am trying to
summarize frequencies so I️ can trim unclear and infrequent items from the
survey.
Tom
Super User Tom
Super User

So that helps explain what you are doing much better.

 

The normalized structure is doubly important for your situation because otherwise you are storing data (the WORD) in the metadata (the variable names ITEM1, ITEM2) etc. 

sas-innovate-white.png

Special offer for SAS Communities members

Save $250 on SAS Innovate and get a free advance copy of the new SAS For Dummies book! Use the code "SASforDummies" to register. Don't miss out, May 6-9, in Orlando, Florida.

 

View the full agenda.

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 8 replies
  • 1902 views
  • 2 likes
  • 4 in conversation