Help using Base SAS procedures

Categorising income groups

Reply
Super Contributor
Posts: 338

Categorising income groups

Hi Colleagues,

I need to categorise personal incomes (PPI1) into 13 categories and all others as “.”.

I have attempted with the following codes but it produced strange categorisations.

Could anyone help me?

data data_families3;
set data_families2;
IF PPI1 < 10000 THEN PPI2 = 01;
ELSE IF PPI1 = 10000 and PPI1 < 15000 THEN PPI2 = 02;
ELSE IF PPI1 = 15000 and PPI1 < 20000 THEN PPI2 = 03;
ELSE IF PPI1 = 20000 and PPI1 < 25000 THEN PPI2 = 04;
ELSE IF PPI1 = 25000 and PPI1 < 30000 THEN PPI2 = 05;
ELSE IF PPI1 = 30000 and PPI1 < 40000 THEN PPI2 = 06;
ELSE IF PPI1 = 40000 and PPI1 < 50000 THEN PPI2 = 07;
ELSE IF PPI1 = 50000 and PPI1 < 60000 THEN PPI2 = 08;
ELSE IF PPI1 = 60000 and PPI1 < 70000 THEN PPI2 = 09;
ELSE IF PPI1 = 70000 and PPI1 < 80000 THEN PPI2 = 10;
ELSE IF PPI1 = 80000 and PPI1 < 90000 THEN PPI2 = 11;
ELSE IF PPI1 = 9000 and PPI1 < 100000 THEN PPI2 = 12;
ELSE IF PPI1 >= 100000 then PPI2=13;
ELSE PPI2 = .;
run;


Thanks

Mirisage
Trusted Advisor
Posts: 2,113

Re: Categorising income groups

A couple of things.

The first "=" in the first 11 ELSE statements needs to be ">=". SAS also allows syntax like
ELSE IF 10000 LE pp1 LT 15000 THEN ppi1 = 02;

The
IF PPI1 LT 10000 THEN PPI2 = 01;
should be
IF . LT PPI1 LT 10000 THEN PPI2 = 01;
Your statement assigns 01 to missing values for PP1.

Doc Muhlbaier
Duke
Super Contributor
Posts: 338

Re: Categorising income groups

Hi Doc Muhlbaier Duke ,

This is great!

It worked very well.

Many thanks,
Mirisage
SAS Super FREQ
Posts: 8,744

Re: Categorising income groups

Hi:
I wonder whether this 9000 instead of 90000 is one possible cause??
[pre]
ELSE IF PPI1 = 9000 and PPI1 < 100000 THEN PPI2 = 12;
[/pre]

My tendency would be to do the assignment of a value with a user-defined format and a INPUT/PUT statement. What you really have is a mini-table of values which should be used to create PPI2. A user-defined format can do this quite well. The key is understanding that the INPUT function takes a character variable and creates a numeric variable. That means the PUT function is used with PPI1 to create a character variable from the MKPPI user-defined format...essentially bypassing the need for a set of nested IF/ELSE statements by using the format to do the lookup. The syntax
10000-<15000 = '2'
in the user-defined format defines a range of values from 10000 to slighly less than 15000 as being value '2'.

cynthia

[pre]
** See results below;
proc format;
value mkppi
low-<10000 = '1'
10000-<15000 = '2'
15000-<20000 = '3'
20000-<25000 = '4'
25000-<30000 = '5'
30000-<40000 = '6'
40000-<50000 = '7'
50000-<60000 = '8'
60000-<70000 = '9'
70000-<80000 = '10'
80000-<90000 = '11'
90000-<100000 = '12'
100000-high ='13'
other = .;
run;

data testfmt;
infile datalines;
input name $ ppi1;
ppi2 = input(put(ppi1,mkppi.),2.);
put _all_;
format ppi2 z2.;
return;
datalines;
alan 99
bob 150000
carl 15000
dana 14999
eliza 59999
fred 60001
george .
;
run;

[/pre]

The LOG results show:
[pre]
2186
2187 data testfmt;
2188 infile datalines;
2189 input name $ ppi1;
2190 ppi2 = input(put(ppi1,mkppi.),2.);
2191 put _all_;
2192 format ppi2 z2.;
2193 return;
2194 datalines;

name=alan ppi1=99 ppi2=01 _ERROR_=0 _N_=1
name=bob ppi1=150000 ppi2=13 _ERROR_=0 _N_=2
name=carl ppi1=15000 ppi2=03 _ERROR_=0 _N_=3
name=dana ppi1=14999 ppi2=02 _ERROR_=0 _N_=4
name=eliza ppi1=59999 ppi2=08 _ERROR_=0 _N_=5
name=fred ppi1=60001 ppi2=09 _ERROR_=0 _N_=6
name=george ppi1=. ppi2=. _ERROR_=0 _N_=7
NOTE: The data set WORK.TESTFMT has 7 observations and 3 variables.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.00 seconds

[/pre]
Ask a Question
Discussion stats
  • 3 replies
  • 161 views
  • 0 likes
  • 3 in conversation