## How do I analyze multiple variables in an array to create a boxplot?

A part of my code (dataset = work.solutions):
&nbsp;
array letters{26} numA numB numC numD numE numF numG numH numI numJ numK numL numM numN numO numP numQ numR numS numT numU numV numW numX numY numZ;
letterSet='abcdefghijklmnopqrstuvwxyz';
do i=1 to 26;
letters{i}=countc(word, substr(letterSet, i, 1));
end;
drop letterSet i;
&nbsp;
(In this code, I am counting occurrences of letters and storing them in respective variables)
MEANS Procedure (Sum):

I wish to create a bargraph (not boxplot) out of this data, which is as follows: The x-axis contains each variable in the array, and the y-axis stores each frequency. I could not find any resource for doing this with multiple variables, and since the array name is not a variable, I could not use that either.&nbsp;
&nbsp;
Any ideas or workarounds I could implement? Thanks!
&nbsp;

Edit: some more context

The dataset is a list of possible combinations in WORDLE. Each has five letters and have been cleaned to be lowercase.

I have counted each letter in the word using a do loop, and stored them in an array of variables called letters. I have also used PROC MEANS to get the total frequency of each variable.

My end result is to get a bar graph of the frequency of each variable. I guess I accidentally put box plot, sorry
3 REPLIES 3

## Re: How do I analyze multiple variables in an array to create a boxplot?

Since you don't provide any actual data it is not quite clear what you are attempting to plot.

Boxplots are used to display variability of a second variable based on a group. You have a single count??? So there isn't any variability to use for a box that I can see.

How sure are you that all of your values of "word" contain only lowercase letters? Your method of counting won't get uppercase letters in the count.

## Re: How do I analyze multiple variables in an array to create a boxplot?

Sorry here is some more context: The dataset is a list of possible combinations in WORDLE. Each has five letters and have been cleaned to be lowercase.

I have counted each letter in the word using a do loop, and stored them in an array of variables called letters. I have also used PROC MEANS to get the total frequency of each variable.

My end result is to get a bar graph of the frequency of each variable. I guess I accidentally put box plot, sorry

## Re: How do I analyze multiple variables in an array to create a boxplot?

@SASSchoolUser wrote:
Sorry here is some more context: The dataset is a list of possible combinations in WORDLE. Each has five letters and have been cleaned to be lowercase.

I have counted each letter in the word using a do loop, and stored them in an array of variables called letters. I have also used PROC MEANS to get the total frequency of each variable.

My end result is to get a bar graph of the frequency of each variable. I guess I accidentally put box plot, sorry

The issue is probably how you used proc means. To make a Vbar or Hbar plot best is to have a single value with the label you want on the axis as the value and then have another variable hold the response count. That can actually be done inside proc means using ODS OUTPUT and the STACKODSOUTPUT options. Here is an example that you can run with a small data set that is supplied by SAS for demonstrating code. The Name values have more than 5 characters but should demonstrate well enough.

```data have;
set sashelp.class;
array letters{26} numA numB numC numD numE numF numG numH numI numJ numK numL numM numN numO numP numQ numR numS numT numU numV numW numX numY numZ;
letterSet='abcdefghijklmnopqrstuvwxyz';
name=lowcase(name);
do i=1 to 26;
letters{i}=countc(name, substr(letterSet, i, 1));
end;
drop letterSet i;
run;

proc means data=have stackodsoutput sum;
var num: ;
ods output summary=plotdata ;
run;

proc sgplot data=plotdata;
vbar variable/response=sum;
run;```

The Stackodsoutput is used to create a data set that looks like the displayed output in the results window for the summary step. So there is one variable, named excitingly Variable, with the name of the variable from the Var statement in Proc means, and another with the statistic(s) requested.

Personally I would have used a slightly different approach but as a minimum the graph would be prettier if you instead of naming variables numa numb numc just use A B C (or a b c if you prefer).

Discussion stats
• 3 replies
• 165 views
• 0 likes
• 2 in conversation