BookmarkSubscribeRSS Feed
MMB31
Calcite | Level 5

Hi 

I am new to sas has has a lot of trouble making the right plot.

I want to make something like this (Picture found from John J. Lee):
Skærmbillede 2023-04-22 kl. 10.52.25.png

I have the variable new_polviews which is either 1=Conservative ,2=Liberal or 3=other. Then i have new_partyid which is 1=Republican 2=Democrat 3= other (just as in the picture) 

 

I want to find out and plot the estimated share of Democrats and Republicans who is Liberal or Conservative and see their timely development to compare the two parties. 

As i understand i can calculate the value as so:
proc freq data=gss1;
table new_polviews*new_partyid;
by year;
run;

Example of one year:

Skærmbillede 2023-04-22 kl. 11.02.16.png

I can see the percentage for each year, but i can't seem to plot them and get the estimated percent with confidence interval. 

I hope you can help me.

M

 

 

 

6 REPLIES 6
Ksharp
Super User

Check @Rick_SAS 's blog for confidence intervals of multinomial proportions https://blogs.sas.com/content/iml/2017/02/15/confidence-intervals-multinomial-proportions.html

 

/*
Check Rick's blog for confidence intervals of multinomial proportions
https://blogs.sas.com/content/iml/2017/02/15/confidence-intervals-multinomial-proportions.html
*/

/* Program to accompany 
   "Simultaneous confidence intervals for multinomial proportions"
   by Rick Wicklin. Published on The DO Loop blog 
    http://blogs.sas.com/content/iml/2017/02/15/confidence-intervals-multinomial-proportions.html
*/


/*******************************************************/
/* The following functions will construct simultaneous */
/* confidence intervals for multinomial proportions    */
/* based on several methods of construction.           */
/* Values for the variable 'Method' are:               */
/*   1 = Quesenberry and Hurst (1964)                  */
/*   2 = Goodman (1965)                                */
/*   3 = naive binomial                                */
/*   4 = Fitzpatrick and Scott                         */
/*       (alpha <= 0.15 only)                          */
/*   5 = Q & H type and estimated variance             */
/*   6 = Goodman type  and estimated variance          */
/*                                                     */
/* Original code: May and Johnson (1997). "A SAS Macro */
/*   for Constructing Simultaneous Confidence Intervals*/
/*   for Multinomial Proportions," Computer Methods and*/
/*   Programs in Biomedicine, p. 153-162.              */
/*                                                     */
/* Rewritten by Rick Wicklin, Feb 2016                 */
/*                                                     */
/*******************************************************/

/* Define library of SAS/IML function */
proc iml;
/* helper function that computes interval based on ChiSq value */
start MultCIFromChiSquare(Count, chiSq);
   n = Count[+];
   p = Count / n;
   a = chiSq + n;
   b = chiSq + 2*n*p;
   c = n*p##2;
   root = sqrt(b##2 - 4*a*c);
   Lower = (b-root) / (2*a);
   Upper = (b+root) / (2*a);
   return (p || Lower || Upper);
finish;

/* helper function that computes interval based on ChiSq 
   value and estimated variance factor */
start MultCIFromChiSquareVar(Count, chiSq);
   n = Count[+];
   p = Count / n;
   a = sqrt( chiSq * p#(1-p) / n );
   Lower = p - a;
   Upper = p + a;
   return (p || Lower || Upper);
finish;

/* Quesenberry and Hurst (1964) */
start MultCI_QH(Count, alpha);
   k = nrow(Count);
   chiSq = quantile("ChiSquare", 1-alpha, k-1);
   return( MultCIFromChiSquare(Count, chiSq) );
finish;

/* Quesenberry and Hurst with estimated variance factor */
start MultCI_QHVar(Count, alpha);
   k = nrow(Count);
   chiSq = quantile("ChiSquare", 1-alpha, k-1);
   return( MultCIFromChiSquareVar(Count, chiSq) );
finish;

/* Goodman (1965) */
start MultCI_Goodman(Count, alpha);
   k = nrow(Count);
   chiSq = quantile("ChiSquare", 1-alpha/k, 1); /*Bonferroni adjustment */
   return( MultCIFromChiSquare(Count, chiSq) );
finish;

/* Goodman with estimated variance factor */
start MultCI_GoodmanVar(Count, alpha);
   k = nrow(Count);
   chiSq = quantile("ChiSquare", 1-alpha/k, 1); /*Bonferroni adjustment */
   return( MultCIFromChiSquareVar(Count, chiSq) );
finish;

/* naive binomial method */
start MultCI_Binomial(Count, alpha);
   chiSq = quantile("chisquare", 1-alpha, 1);
   n = Count[+];
   p = Count / n;
   a = sqrt(chiSq / (4*n));
   Lower = p - a;
   Upper = p + a;
   return (p || Lower || Upper);
finish;

/* Fitzpatrick and Scott (1987)                */
start MultCI_FS(Count, alpha);
   if alpha <= 0.016 then
      c2 = quantile("chisquare", 1-alpha/2, 1);
   else if alpha > 0.016 && alpha <= 0.15 then
      c2 = (8/9)*quantile("chisquare", 1-alpha/3, 1);
   else do;
      print 'alpha must be less than 0.15 to use the Fitzpatrick and Scott method';
      return( j(nrow(Count), 3, .) );
   end;
   n = Count[+];
   p = Count / n;
   a = sqrt(c2 / (4*n));
   Lower = p - a;
   Upper = p + a;
   return (p || Lower || Upper);
finish;

/*******************************************************/
/* Driver function for May and Johnson (1997) functions*/
/* Print the CI results for various tests.             */
/* Values for the variable 'type' are:                 */
/*                   1 = Quesenberry and Hurst (1964)  */
/*                   2 = Goodman (1965) [default]      */
/*                   3 = naive binomial                */
/*                   4 = Fitzpatrick and Scott         */
/*                       (alpha<=0.15 only)            */
/*                   5 = Q & H type using observed     */
/*                       proportions as variance       */
/*                   6 = Goodman type using observed   */
/*                       proportions as variance       */
/*******************************************************/
start MultCI(Count, alpha=0.05, Method=2);
   if Method=1 then
      return MultCI_QH(Count, alpha);
   else if Method=2 then
      return MultCI_Goodman(Count, alpha);
   else if Method=3 then 
      return MultCI_Binomial(Count, alpha);
   else if Method=4 then
      return MultCI_FS(Count, alpha);
   else if Method=5 then 
      return MultCI_QHVar(Count, alpha);
   else if Method=6 then
      return MultCI_GoodmanVar(Count, alpha);
finish;


start MultCIPrint(Category, Count, alpha=0.05, Method=2);
   title = {
   "Method of Quesenberry and Hurst (1964)"
   "Method of Goodman (1965)"
   "Naive binomial method"
   "Method of Fitzpatrick and Scott (1987)"
   "Method of Quesenberry and Hurst (1964), using Var = p(1-p)"
   "Method of Goodman (1965), using Var = p(1-p) "
   };

   print (title[Method]);
   if type(Category)='C' then Cat = Category;
   else Cat = putn(Category, "Best6.");
   alphaText = putn(alpha, "Best6.");
   labl = "Simultaneous Confidence Intervals";
   CI = MultCI(Count, alpha, Method);
   result = Count || CI;
   onemalpha = strip(putn(1-alpha, "Percent7."));
   varNames = {"Count" "Proportion"} || (onemalpha + " Lower") ||
              (onemalpha + " Upper");
   print result[Label=labl rowname=Cat
                colname=varNames format=Best6.];
finish;

store module=_all_;
quit;








/*************Start******************/
proc format;
value new_polviews 
1='Conservative' 
2='Liberal'  
3='other'
;
value new_partyid
1='Republican' 
2='Democrat' 
3='other'
;
run;
data have;
call streaminit(123);
do year=1970 to 2010 by 10;
 do new_partyid=1,2,3;
   do i=1 to 1000;
   new_polviews=rand('table',0.5,0.3,0.2);
   output;
   end;
 end;
end;
drop i;
format new_polviews new_polviews. new_partyid new_partyid.;
run; 
proc freq data=have noprint;
by year new_partyid;
table new_polviews/out=freq list;
run;

proc iml;
load module=(MultCI MultCIPrint);
use freq nobs nobs; 
read all var {"year" "new_partyid" "new_polviews" "Count"}; 
close;
idx_start=uniqueby(new_partyid);
idx_end=t(remove(idx_start-1,1))//nobs;
idx=idx_start||idx_end;

alpha = 0.05;

create CIs var {"Category_year" "Category_new_partyid" "Category" "Estimate" "Lower" "Upper"};
do i=1 to nrow(idx);
_idx=idx[i,1]:idx[i,2];
Category_year=year[_idx];
Category_new_partyid=new_partyid[_idx];
Category=new_polviews[_idx];
Category_count=count[_idx];
/* call MultCIPrint(Category, Category_count, alpha, 2);  Goodman = 2*/
CI = MultCI(Category_count, alpha, 2);  /*or simply CI = MultCI(Count) */

/* write estimates and CIs to data set */
Estimate = CI[,1];  Lower = CI[,2];  Upper = CI[,3];
append;
end;
close;
quit;


ods graphics/attrpriority=none;
proc sgpanel data=cis;
format Category new_polviews. Category_new_partyid new_partyid. Estimate percent8.2;
panelby Category_new_partyid/novarname onepanel rows=1;
styleattrs datasymbols=(circle plus triangle);
series x=Category_year y=Estimate/markers group=Category 
 lineattrs=(pattern=solid) markerattrs=(size=12);
band x=Category_year lower=Lower upper=Upper/group=Category transparency=0.9;
band x=Category_year lower=Lower upper=Upper/group=Category nofill outline
 lineattrs=(pattern=solid);
run;

Ksharp_0-1682163348788.png

 

ballardw
Super User

Proc freq doesn't do confidence limits of proportions directly. You might try a different procedure that will such as Proc Surveyfreq.

 

An example of using Surveyfreq to create an output data set with confidence limits to use with the Band statement.

proc surveyfreq data=sashelp.cars;
  where type in ('Sedan' 'SUV' ) and cylinders in (4 6 8); 
  tables origin*type*cylinders /cl row column;
  ods output crosstabs= carplot;

run;

You will want to examine the output data set, carplot in this case carefully to determine which of the percent variables you want to use and the associated upper/lower confidence limit variables. For plot you will also want to remove the "Total" valued rows which show the column or row marginal total rows. Depending on your choice of plot options you may need to sort the data as well.

Your equivalent code might look like:

proc surveyfreq data=gss1;
tables new_polviews*new_partyid*year /cl row column;
run;

The order of the variables on the Tables statement will determine which of row percent, column percent or percent that you want.

Ksharp
Super User

@ballardw ,
Intereting. I compared the result from PROC SURVEYFREQ and PROC IML.
They are all different .
@Rick_SAS ,do you have some comment ?

 

data Psych;
input Category $21. Count;
datalines;
Neurotic              91
Depressed             49
Schizophrenic         37
Personality disorder  43
;
proc iml;
load module=(MultCI MultCIPrint);
use Psych; read all var {"Category" "Count"}; close;
 
alpha = 0.05;
call MultCIPrint(Category, Count, alpha, 1); 
call MultCIPrint(Category, Count, alpha, 2); 
call MultCIPrint(Category, Count, alpha, 3); 
call MultCIPrint(Category, Count, alpha, 4); 
call MultCIPrint(Category, Count, alpha, 5); 
call MultCIPrint(Category, Count, alpha, 6); 

quit;




data x;
 set Psych;
do i=1 to count;
output;
end;
run;
proc surveyfreq data=x order=data;
tables Category /cl row column;
run;
Rick_SAS
SAS Super FREQ

I suppose my comment is that you shouldn't expect the SURVEYFREQ CIs to match any of the other methods, just as you shouldn't the other methods to match each other.  The SURVEYFREQ procedure uses a different formula for the standard error of the proportion, and that StdErr is used to form the CIs. See
SAS Help Center: Proportions

Ksharp
Super User
Rick,
Agree. I notice that PROC SURVEYFREQ using T distribution to calculated CI and sample design,
but yours is Chisquare distribution.

Ksharp_0-1682336596581.png

 

ballardw
Super User

@Ksharp wrote:

@ballardw ,
Intereting. I compared the result from PROC SURVEYFREQ and PROC IML.
They are all different .
@Rick_SAS ,do you have some comment ?

I don't have access to IML at the moment. I suspect partially that order of variables on the TABLES statement with Surveyfreq may be involved. I know that I have to look very carefully at output when I haven't used Surveyfreq for a while to make sure I am using the right order.

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 6 replies
  • 1804 views
  • 0 likes
  • 4 in conversation