Contributor
Posts: 58

# Gini Calculation by region

Folks,

I would like to augment this great code to rather than look at an entire sample of my dataset, to create a gini code for each region.

The code was taken from here. For arguments sake I would like to include a

``by region ``

statement.

https://www.terpconnect.umd.edu/~pnc/gini.sas

``````/*                        GINI CODE
=========

This SAS code was written by Philip N. Cohen. It is meant to be adaptable to
various units of analysis and measures of interest. The Gini coefficient can
be calculated for lots of different distributions, although it is most often
used for income.

The formula used here is from _The methods and materials of demography_,
by Henry S. Shryock, Jacob S. Siegel, and associates. Orlanda, FL:

(The author of the code can take no responsibility for its reliability
or accuracy, or for the results obtained with its use; but he would be
*/

/* The variable I use is CAPINC and the weight is CAPWGT.
Substitute these for your own measure and population weight.
Those are the only variable names you have to change to suit
*/

/* This creates a table with one line for each level of income,
the number of (weighted) people with that income, and the
percent with that income. */

title 'Income distribution';
proc freq data=temp;
tables capinc / noprint out = table;
format capinc 7.0;
weight capwgt;
run;

/* this data step creates cumulative income and population
columns */

data table;
set table;

retain suminc perpop;

suminc + (capinc * count);
perpop + percent;

/* suminc is the cumulative income at each point in the distribution.
perpop is the cumulative population at each point in the distribution.
Note that PERCENT and COUNT are variables created by PROC FREQ.
*/

run;

/* This sort and data step takes the last value of suminc,
which is the total income, and adds it onto every
record in the table as totalinc. Then it divides suminc
by totalinc for each line to create the percent of
income below that point in the distribution */

proc sort data=table;
by descending suminc ;
run;

data table;
set table;
by descending suminc;

if _n_=1 then do;
totalinc=suminc;
end;

retain totalinc;

perinc = (suminc/totalinc) * 100;

run;

/* this sort just puts it back in order
from low to high */

proc sort data=table;
by perpop;
run;

/* To calculate Gini:
sum[Xsub(i) * Ysub(i+1)] - sum[Xsub(i+1) * Ysub(i)]
where X is the proportion of population column and
Y is the proportion of income column.
*/

data ginidat;
set table;

xlag = lag(perpop);
xlag = xlag / 100;

ylag = lag(perinc);
ylag = ylag / 100;

columna = (perinc/100) * xlag;
columnb = (perpop/100) * ylag;

retain suma sumb;

suma + columna;
sumb + columnb;

gini = suma - sumb;

run;

title2 'Gini coefficient';
proc print data=ginidat;
var gini;
where perinc = 100;
run;
title2;``````

Any help would be most welcome.

Discussion stats
• 0 replies
• 339 views
• 0 likes
• 1 in conversation