Data visualization with SAS programming

Sum two values in Proc Freq then display with SGPlot Series

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 11
Accepted Solution

Sum two values in Proc Freq then display with SGPlot Series

Hello,

 

I have a data set with 78k observations that include two variables: _state and SCNTMEL1. SCNTMEL1 has values that include 1-5.  I am trying to use SGPlot Series to display _States on the x axis and the sum of the Freq Row Pct for values 1 and 2 ("Always" and "Usually") for each (_state) on the y axis.  I can calculate the individual percentages for each value for SCNTMEL1 using Proc Freq, but I am struggling to figure out how to sum the Row Pct for values 1 and 2, then identify those for Proc SGPlot, Series.

 

Here is the code for the Proc Freq step:

 

Proc sort data=brfss15.scontext
Out=SCM_SORT;
by _State;

Run;

Proc format;
Value Always
1='Always'
2='Usually'
3='Sometimes'
4='Rarely'
5='Never'
7='Not Sure'
8='N/A'
9='Refused';
Run;

Proc Format;
Value State
1='Alabama'
5='Arkansas'
10='Delaware'
11='D.C.'
13='Georgia'
22='Louisiana'
27='Minnesota'
28='Mississippi'
29='Missouri'
44='Rhode Island'
47='Tennessee'
49='Utah';
Run;


Proc freq data=SCM_SORT;
Tables _STATE*SCNTMEL1/nopercent nocol;
Format _State State.;
Format SCNTMEL1 Always.;

Run;

I tried an out= statement to capture the Freq data and I've tried various Proc SGPlot alternatives, including Stat=freq, but don't know how to isolate and combine the Row Pct for values 1 and 2.

Hoping this description is sufficiently clear. Thanks in advance.

  


Accepted Solutions
Solution
‎12-21-2017 08:55 AM
Super User
Posts: 13,084

Re: Sum two values in Proc Freq then display with SGPlot Series


Ddormer wrote:

Thanks for your responses.

 

Let me try to clarify my issue:  

 

Goal:  Use SGPlot to create a VBar graph (or other graph) by combining more than one Row Pct value from the Proc Freq report.  In the instance below, the survey question is about general health.  For each state, I want to show the percent (Row Pct) of those who responded with values equivalent to "Poor" or "Fair".

 

Here’s the root of my problem:  I don’t know how to save the values from the Proc Freq report or otherwise generate them in the SGPlot step so I can call them as the source data for SGPlot. Even when I use an output statement for Proc Freq, I don't see the variables or values.  (A smaller problem is displaying more than one value but not all values for Row Pct in one bar, but I can see the general approach.)

 

In the example below, for Alabama, the value I want to see on the graph on a 100 (percent) y axis for the first bar (AL) is:  25.39 (8.75 + 16.64).

 

I can provide a subset of the data file, but hopefully this clarification will allow you to guide me to the solution.

 

Thanks in advance.

 

Proc Freq Report.JPG


SHOW the code you used for proc freq that you are using where "I don't see the variables or values."

One thing to know about the OUT= option of the tables statement in proc freq is that it only has the contents of the LAST data requested. So if your tables statement were to look like:

 

tables vara* varb   varc*varb / out=results;

then the Results data set will only contain the output related to varc*varb.

If you have multiple combinations of variables that you want summarized in the same way then you need a separate tables statement for each with a separate differently named output data set name.

 

Also since you are showing values in your freq output with GHStatus (I notice this is NOT one of the variables you were using previously and using a different format) of -1 and -3 you might have further work with your format or recoding unless they really should be included in the denominator of your percentages. Consider if your "total" column should be "all records" or "all records with a response of  1 through 5".

 

And to throw a largish monkey wrench into the whole thing: Your base dataset BRFSS15.scontext indicates the data started with Behavior Risk Factor Surveillance System data. That data usually requires methods for dealing with complex samples and weights. I recognize this having worked with BRFSS data at different levels for 1998 to present.

View solution in original post


All Replies
Respected Advisor
Posts: 2,661

Re: Sum two values in Proc Freq then display with SGPlot Series

You might want to try PROC SUMMARY, which can add across the two "Always" and "Usually" groups.

 

Something like this (UNTESTED CODE)

 

proc summary data=brfss15.scontext(where=(scntmel1 in (1,2)) nway;
    class state;
    var scntmel1;
    output out=sums n=n;
run;
--
Paige Miller
Super User
Posts: 13,084

Re: Sum two values in Proc Freq then display with SGPlot Series

Can you provide a small example data set with maybe just a couple of states and few values each of your SCNTMEL1 and a worked by hand set of the values you want. Since Proc Freq really doesn't do sums but counts, I can't quite wrap my head around what you mean by summing row percentages at all.

 

If I understand what your are trying you might want to look at the result of this code:

Proc format;
	Value combined
	1,2='Always/Usually'
	3='Sometimes'
	4='Rarely'
	5='Never'
	7='Not Sure'
	8='N/A'
	9='Refused';
Run;
Proc freq data=SCM_SORT;
	Tables _STATE*SCNTMEL1/out=values nopercent nocol;
	Format _State State.;
	Format SCNTMEL1 Always.;
	
Run;

Proc freq data=SCM_SORT;
	Tables _STATE*SCNTMEL1/ out=combined nopercent nocol;
	Format _State State.;
	Format SCNTMEL1 combined.;
	
Run;

data plot;
   set values (where=(SCNTMEL1 in (1,2) ) in=1)
       combined ( where=(SCNTMEL1 =1) in=2)
   ;
   length  grouptype $ 14;
   if in1 then grouptype = put (SCNTMEL1,Always.);
   if inw then grouptype = put (SCNTMEL1, combined.);
run;
Occasional Contributor
Posts: 11

Re: Sum two values in Proc Freq then display with SGPlot Series

Thanks for your responses.

 

Let me try to clarify my issue:  

 

Goal:  Use SGPlot to create a VBar graph (or other graph) by combining more than one Row Pct value from the Proc Freq report.  In the instance below, the survey question is about general health.  For each state, I want to show the percent (Row Pct) of those who responded with values equivalent to "Poor" or "Fair".

 

Here’s the root of my problem:  I don’t know how to save the values from the Proc Freq report or otherwise generate them in the SGPlot step so I can call them as the source data for SGPlot. Even when I use an output statement for Proc Freq, I don't see the variables or values.  (A smaller problem is displaying more than one value but not all values for Row Pct in one bar, but I can see the general approach.)

 

In the example below, for Alabama, the value I want to see on the graph on a 100 (percent) y axis for the first bar (AL) is:  25.39 (8.75 + 16.64).

 

I can provide a subset of the data file, but hopefully this clarification will allow you to guide me to the solution.

 

Thanks in advance.

 

Proc Freq Report.JPG

Occasional Contributor
Posts: 11

Re: Sum two values in Proc Freq then display with SGPlot Series

Friends,

 

Hold up on the last request!  :-) 

 

I think I found exactly what I need from Rick Wicklin at https://blogs.sas.com/content/iml/2016/10/03/proc-freq-two-way-graphs.html.

 

Not only does this show me how to create the simple graph using ODS within the Proc Freq step, it also shows how to output the data from that graph so I can use it in the more complex graph I need through SGPlot.

 

Stand by.  I'll let you know.

 

Thanks.

Respected Advisor
Posts: 2,661

Re: Sum two values in Proc Freq then display with SGPlot Series

As stated above, I think PROC FREQ is the wrong procedure here. It seems to me that this would be a lot simpler if you used PROC SUMMARY, as I explained.

--
Paige Miller
Solution
‎12-21-2017 08:55 AM
Super User
Posts: 13,084

Re: Sum two values in Proc Freq then display with SGPlot Series


Ddormer wrote:

Thanks for your responses.

 

Let me try to clarify my issue:  

 

Goal:  Use SGPlot to create a VBar graph (or other graph) by combining more than one Row Pct value from the Proc Freq report.  In the instance below, the survey question is about general health.  For each state, I want to show the percent (Row Pct) of those who responded with values equivalent to "Poor" or "Fair".

 

Here’s the root of my problem:  I don’t know how to save the values from the Proc Freq report or otherwise generate them in the SGPlot step so I can call them as the source data for SGPlot. Even when I use an output statement for Proc Freq, I don't see the variables or values.  (A smaller problem is displaying more than one value but not all values for Row Pct in one bar, but I can see the general approach.)

 

In the example below, for Alabama, the value I want to see on the graph on a 100 (percent) y axis for the first bar (AL) is:  25.39 (8.75 + 16.64).

 

I can provide a subset of the data file, but hopefully this clarification will allow you to guide me to the solution.

 

Thanks in advance.

 

Proc Freq Report.JPG


SHOW the code you used for proc freq that you are using where "I don't see the variables or values."

One thing to know about the OUT= option of the tables statement in proc freq is that it only has the contents of the LAST data requested. So if your tables statement were to look like:

 

tables vara* varb   varc*varb / out=results;

then the Results data set will only contain the output related to varc*varb.

If you have multiple combinations of variables that you want summarized in the same way then you need a separate tables statement for each with a separate differently named output data set name.

 

Also since you are showing values in your freq output with GHStatus (I notice this is NOT one of the variables you were using previously and using a different format) of -1 and -3 you might have further work with your format or recoding unless they really should be included in the denominator of your percentages. Consider if your "total" column should be "all records" or "all records with a response of  1 through 5".

 

And to throw a largish monkey wrench into the whole thing: Your base dataset BRFSS15.scontext indicates the data started with Behavior Risk Factor Surveillance System data. That data usually requires methods for dealing with complex samples and weights. I recognize this having worked with BRFSS data at different levels for 1998 to present.

Occasional Contributor
Posts: 11

Re: Sum two values in Proc Freq then display with SGPlot Series

Thanks to @ballardw and @PaigeMiller for your valuable suggestions..  A few quick comments:

 

This is my first SAS exercise, so I'm certain I'm not finding the best or most efficient solutions yet.  If it generates the right results, that will be a victory.  I'm sure @PaigeMiller is right about using Proc Summary instead of Proc Freq, but since I think I figured out how to accomplish the task with Proc Freq, I'll use that for now and will learn more about Proc Summary.

 

It appears that I am able to do everything I need by including OutPct  and Out=FreqOut1 in the Tables statement in Proc Freq then using that out data set as the source for SGPlot.  In fact, I'm combining three out= data sets to generate the graphs I want.  So far, all is good.  

 

@ballardw, thanks for the comment about the BRFSS weighting factors.  I understand how they are calculated-I think.  My first goal is to generate the results I want with the raw data (which reconciles to the BRFSS code bock) then come back through and add the weighted factors as a new variable.  Now I know who I'll pester if I get stuck.  :-)

 

Thank you both so very much.

 

Doug

Respected Advisor
Posts: 2,661

Re: Sum two values in Proc Freq then display with SGPlot Series

[ Edited ]

Yes, the lesson is that there are a lot of ways to get this result in SAS, certainly PROC FREQ followed by doing some calculations in a data step also works.

 

Efficiency is important to me, learning is important to you, and that's also fine.

--
Paige Miller
Super User
Posts: 13,084

Re: Sum two values in Proc Freq then display with SGPlot Series


Ddormer wrote:

 

 

@ballardw, thanks for the comment about the BRFSS weighting factors.  I understand how they are calculated-I think.  My first goal is to generate the results I want with the raw data (which reconciles to the BRFSS code bock) then come back through and add the weighted factors as a new variable.  Now I know who I'll pester if I get stuck.  :-)

 


So it appears you are looking to confirm the raw counts in the survey data so Proc Freq will work. Once you start working with the weighted data you should be using the survey procedures Surveymeans and SurveyFreq which have entirely different requirements to get the output into data sets but do allow getting more variables into a single data set, though determining which you want, especially from SurveyFreq may be a tad daunting. Or if using SAS-callable SUDAAN even more entertaining.

Occasional Contributor
Posts: 11

Re: Sum two values in Proc Freq then display with SGPlot Series

@ballardw, Thank you for the direction to SurveyMeans and SurveyFreq.  So far, this has been pretty reasonable in its level of difficulty for a first project.  However, from your comment, and from the BRFSS weighting itself, this is about to get interesting.

 

Stay tuned.

 

Kindest regards,

 

Doug

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 10 replies
  • 287 views
  • 0 likes
  • 3 in conversation