Desktop productivity for business analysts and programmers

Gini Calculation by region

Posts: 58

Gini Calculation by region



I would like to augment this great code to rather than look at an entire sample of my dataset, to create a gini code for each region. 


The code was taken from here. For arguments sake I would like to include a 

by region 



/*                        GINI CODE

   This SAS code was written by Philip N. Cohen. It is meant to be adaptable to
   various units of analysis and measures of interest. The Gini coefficient can
   be calculated for lots of different distributions, although it is most often
   used for income.

   The formula used here is from _The methods and materials of demography_,
   by Henry S. Shryock, Jacob S. Siegel, and associates. Orlanda, FL:
   Academic Press, 1976 (p. 98).

   (The author of the code can take no responsibility for its reliability
    or accuracy, or for the results obtained with its use; but he would be
    glad to take partial credit for it successful use or adaptation.)

/* The variable I use is CAPINC and the weight is CAPWGT.
   Substitute these for your own measure and population weight.
   Those are the only variable names you have to change to suit
   your data.

/* This creates a table with one line for each level of income,
   the number of (weighted) people with that income, and the
   percent with that income. */

title 'Income distribution';
proc freq data=temp;
tables capinc / noprint out = table;
format capinc 7.0;
weight capwgt;

/* this data step creates cumulative income and population
   columns */

data table;
set table;

retain suminc perpop;

suminc + (capinc * count);
perpop + percent;

/* suminc is the cumulative income at each point in the distribution.
   perpop is the cumulative population at each point in the distribution.
   Note that PERCENT and COUNT are variables created by PROC FREQ.


/* This sort and data step takes the last value of suminc,
   which is the total income, and adds it onto every
   record in the table as totalinc. Then it divides suminc
   by totalinc for each line to create the percent of
   income below that point in the distribution */

proc sort data=table;
by descending suminc ;

data table;
set table;
by descending suminc;

if _n_=1 then do;

retain totalinc;

perinc = (suminc/totalinc) * 100;


/* this sort just puts it back in order
   from low to high */

proc sort data=table;
by perpop;

/* To calculate Gini:
   sum[Xsub(i) * Ysub(i+1)] - sum[Xsub(i+1) * Ysub(i)]
   where X is the proportion of population column and
   Y is the proportion of income column.

data ginidat;
set table;

xlag = lag(perpop);
xlag = xlag / 100;

ylag = lag(perinc);
ylag = ylag / 100;

columna = (perinc/100) * xlag;
columnb = (perpop/100) * ylag;

retain suma sumb;

suma + columna;
sumb + columnb;

gini = suma - sumb;


title2 'Gini coefficient';
proc print data=ginidat;
var gini;
where perinc = 100;

Any help would be most welcome. 

Ask a Question
Discussion stats
  • 0 replies
  • 1 in conversation