Solved: Creating new dataset from proc freq tables

sav912 · Posted 11-17-2022 03:07 PM

I have a survey that has 204 items/variables each rated from 0-4 or 99 (unlcear). I need to know what percentage of people rated each item each number (0-4, 99). Obviously I can run proc freq and get individual tables for each of the 204 variables, but I want to compile the frequency counts into a new dataset. I have tried using output, but it only uses the last variable.

Is there a way to create a dataset from the frequency tables where all of the items are observations and the Likert ratings are variables? And then each cell contains the frequency/percentage of each rating? I would also like to keep the missing values and have 0's inserted where a possible rating was not endorsed.

For example I currently have 97 participants and 204 items structured as such:

Obs Item1 Item2 Item3 Item4

1 0 . 2 99

2 4 1 2 1

3 0 2 1 2

I want a dataset structured like below (even better if it were percentages instead of counts):

Obs Freq0 Freq1 Freq2 Freq3 Freq4 Freq99 Missing

Item1 2 0 0 0 1 0 0

Item2 0 1 1 0 0 0 1

Item3 0 1 2 0 0 0 0

Item4 0 1 1 0 0 1 0

Any ideas are appreciated!

PeterClemmensen · Posted 11-17-2022 04:23 PM

There are a few things to consider and a few hoops you have to jump to get your exact desired result. For instance, it seems you want the Freq3 column even though you do not have any values of three in your sample data.

This should give you what you want, both in a report and in a SAS Data set.

data have;
input Item1 - Item4;
datalines;
0 . 2 99 
4 1 2 1  
0 2 1 2  
;

proc format;
   value fmt (notsorted)
   0  = 'Freq0  '
   1  = 'Freq1  '
   2  = 'Freq2  '
   3  = 'Freq3  '
   4  = 'Freq4  '
   99 = 'Freq99 '
   .  = 'Missing'
   ;
run;

data temp;
   set have;
   array i Item:;
   do over i;
      item = vname(i);
      val  = i;
      output;
   end;
run;

ods output table=table;
proc tabulate data = temp missing;
   class item val / preloadfmt order = data;
   tables item='', val=''*n='' / printmiss;
   format val fmt.;
run;

data table;
   set table;
   v = put(val, fmt.);
run;

proc transpose data = table out = want(drop = _:);
   by item;
   id v;
   var N;
run;

Result

item  Freq0 Freq1 Freq2 Freq3 Freq4 Freq99 Missing
Item1 2     .     .     .     1     .      .
Item2 .     1     1     .     .     .      1
Item3 .     1     2     .     .     .      .
Item4 .     1     1     .     .     1      .

The DATA to DATA Step Macro
Blog: SASnrd

View solution in original post

PaigeMiller · Posted 11-17-2022 03:16 PM

First, this is a terrible structure for data. A better structure for this type of data (and almost any type of data) is:

data have;
    input item freq;
    cards;
1 0
2 .
3 2
4 99
1 4
2 1
;

This is a partial data set, obviously it would be much longer.

Then, the table you want is trivial

proc freq data=have;
    table item*freq;
run;

Message here is to avoid "Excel thinking" and Excel data structure and try to work in data structures that fit into SAS PROCs.

To turn your data into the better structure, here is untested code

data re_arrange;
    set original_data;
    array itemnum item:;
    do i=1 to dim(itemnum);
        item=i;
        freq=itemnum(i);
        output;
     end;
     keep item freq;
run;

--
Paige Miller

sav912 · Posted 11-17-2022 03:27 PM

Thank you for the response. I have never seen survey data structured that way- Can you explain why I should restructure it in that way? I will eventually have 300 participants answering 204 items, so will I end up with 61,200 rows?

PeterClemmensen · Posted 11-17-2022 04:23 PM

There are a few things to consider and a few hoops you have to jump to get your exact desired result. For instance, it seems you want the Freq3 column even though you do not have any values of three in your sample data.

This should give you what you want, both in a report and in a SAS Data set.

data have;
input Item1 - Item4;
datalines;
0 . 2 99 
4 1 2 1  
0 2 1 2  
;

proc format;
   value fmt (notsorted)
   0  = 'Freq0  '
   1  = 'Freq1  '
   2  = 'Freq2  '
   3  = 'Freq3  '
   4  = 'Freq4  '
   99 = 'Freq99 '
   .  = 'Missing'
   ;
run;

data temp;
   set have;
   array i Item:;
   do over i;
      item = vname(i);
      val  = i;
      output;
   end;
run;

ods output table=table;
proc tabulate data = temp missing;
   class item val / preloadfmt order = data;
   tables item='', val=''*n='' / printmiss;
   format val fmt.;
run;

data table;
   set table;
   v = put(val, fmt.);
run;

proc transpose data = table out = want(drop = _:);
   by item;
   id v;
   var N;
run;

Result

item  Freq0 Freq1 Freq2 Freq3 Freq4 Freq99 Missing
Item1 2     .     .     .     1     .      .
Item2 .     1     1     .     .     .      1
Item3 .     1     2     .     .     .      .
Item4 .     1     1     .     .     1      .

The DATA to DATA Step Macro
Blog: SASnrd

PaigeMiller · Posted 11-17-2022 05:00 PM

@sav912 wrote:

Thank you for the response. I have never seen survey data structured that way- Can you explain why I should restructure it in that way? I will eventually have 300 participants answering 204 items, so will I end up with 61,200 rows?

Because (almost?) all SAS PROCs are designed to work on long data sets rather than wide data sets. If your data is structured this way, the desired analysis is usually simple. If you work with the wide data set structure, it requires a lot more work to get the desired analysis; and for some PROCs the only way way to get the desired result is with long rather than wide data sets.

As I said, its trivial to get the report you want with a long data set. With a wide data set, you have to do a lot more work, as I have shown and as @PeterClemmensen has shown. You need to avoid "Excel thinking" and Excel data structures in SAS.

--
Paige Miller

sav912 · Posted 11-18-2022 10:30 AM

Thank you for clarifying.

Tom · Posted 11-17-2022 09:40 PM

What do you mean by SURVEY data?

Are ITEM1 to ITEM4 the QUESTIONS on the survey?
Are OBS 1 to 3 the RESPONDENTS to the survey?

Are the cells in the actual response to the questions? Do all of the questions have the same set of responses? Like "Rated on a scale from 1 to 10 ...." Or are they different (education level, income, ...)

If later then what the heck does the output table represent?

If former then as others have suggested TRANSPOSE the data and generate the counts from that.

proc format;
invalue miss99f 
  '99' = .N 
  other = [32.]
;
run;

data have;
  input ID (Item1-Item4) (:miss99f.);
cards;
1 0 . 2 99
2 4 1 2 1
3 0 2 1 2
; 

proc transpose data=have name=question out=tall(rename=(col1=answer));
  by id;
  var item1-item4;
run;

proc freq data=tall;
  tables question*answer / missing ;
run;

sav912 · Posted 11-18-2022 10:29 AM

There is no need to be rude. Each observation is a participant. Each item
is a different word. They are asked the same question about each word (how
often did you experience this in the past month) and are rated on the same
scale. As to “what the heck” my output will be used for: I️ am trying to
summarize frequencies so I️ can trim unclear and infrequent items from the
survey.

Tom · Posted 11-18-2022 11:30 AM

So that helps explain what you are doing much better.

The normalized structure is doubly important for your situation because otherwise you are storing data (the WORD) in the metadata (the variable names ITEM1, ITEM2) etc.

Creating new dataset from proc freq tables

Re: Creating new dataset from proc freq tables

Re: Creating new dataset from proc freq tables

Re: Creating new dataset from proc freq tables

Re: Creating new dataset from proc freq tables

Re: Creating new dataset from proc freq tables

Re: Creating new dataset from proc freq tables

Re: Creating new dataset from proc freq tables

Re: Creating new dataset from proc freq tables

Re: Creating new dataset from proc freq tables

Registration is open

SAS Training: Just a Click Away