Help using Base SAS procedures

Categorizing an Income Series into Deci

Reply
Super Contributor
Posts: 338

Categorizing an Income Series into Deci

Hello Colleagues,

I am trying to split the values of income variable “PPI” below into 10 parts (deciles). PPI has over 10,000 records.

I attempted the program indicated below but all values for the newly created ‘PPI_decile’ variable turn out to be “10” whereas it should have been 1, 2,3,4, 5, 6, 7,8, 9 and 10.

I wonder if anyone of you could help me to revise this program or suggest a more efficient approach.



Data data_families1;
Input ID PPI;
Cards;
1 750
2 800
3 850
4 950
5 1250
6 1500
7 .
8 1600
9 1700
10 1850
11 2500
12 750
13 2100
14 2500
15 3750
16 .
17 750
;
Run;


/*DECILE CALCULATIONS*/
proc univariate data=data_families1;
var PPI;
output out=decile pctlpts=10 20 30 40 50 60 70 80 90 pctlpre=pct;
run;

/*Write the cut points to macro variable*/
data _null_;
set data_families1;
call symput ('q1', pct10);
call symput ('q2', pct20);
call symput ('q3', pct30);
call symput ('q4', pct40);
call symput ('q5', pct50);
call symput ('q6', pct60);
call symput ('q7', pct70);
call symput ('q8', pct80);
call symput ('q9', pct90);
run;

/*Creating a new variable containing the deciles*/
data data_families2;
set data_families1;
if PPI=. then PPI_decile=.;
else if PPI <=&q1 then PPI_decile=1;
else if PPI<=&q2 then PPI_decile=2;
else if PPI<=&q3 then PPI_decile=3;
else if PPI<=&q4 then PPI_decile=4;

else if PPI<=&q5 then PPI_decile=5;
else if PPI<=&q6 then PPI_decile=6;
else if PPI<=&q7 then PPI_decile=7;
else if PPI<=&q8 then PPI_decile=8;
else if PPI<=&q9 then PPI_decile=9;
else PPI_decile=10;
run;



Thank you

Mirisage
SAS Super FREQ
Posts: 8,868

Re: Categorizing an Income Series into Deci

Hi:
Did your code example get cut off??? Remember that if your code contains < or > symbols, you need to "protect" them, as described in this previous forum posting:
http://support.sas.com/forums/thread.jspa?messageID=27609毙

cynthia
Super Contributor
Posts: 338

Re: Categorizing an Income Series into Deci

Hi Cynthia,

No, it did not get cut off.

So, the question is how to split the PPI variable below into 10 parts (deciles) using SAS.

Data data_families1;
Input ID PPI;
Cards;
1 750
2 800
3 850
4 950
5 1250
6 1500
7 .
8 1600
9 1700
10 1850
11 2500
12 750
13 2100
14 2500
15 3750
16 .
17 750
;
Run;
Respected Advisor
Posts: 3,799

Re: Categorizing an Income Series into Deci

I think you want PROC RANK with the GROUPS option.

[pre]
Data data_families1;
Input ID PPI @@;
Cards;
1 750 2 800 3 850 4 950 5 1250 6 1500 7 .
8 1600 9 1700 10 1850 11 2500 12 750 13 2100
14 2500 15 3750 16 . 17 750
;
Run;
proc rank group=10 out=deciles;
var ppi;
ranks decile;
run;
proc print;
run;
[/pre]
SAS Super FREQ
Posts: 8,868

Re: Categorizing an Income Series into Deci

Oh, I just wondered because
[pre]
else if PPI
[/pre]

(which is where your post ends ... is not a complete, valid SAS statement).

It just seemed odd.

cynthia
Super Contributor
Posts: 338

Re: Categorizing an Income Series into Deci

Posted in reply to Cynthia_sas
Hi Cynthia,

This is the complete program I attempted to split the above data set into deciles.

/*DECILE CALCULATIONS*/
proc univariate data=data_families1;
var PPI;
output out=percentile pctlpts=10 20 30 40 50 60 70 80 90 pctlpre=pct;
run;

/*Write the cutpoints to macro variables*/
data _null_;
set data_families1;
call symput ('q1', pct10);
call symput ('q2', pct20);
call symput ('q3', pct30);
call symput ('q4', pct40);
call symput ('q5', pct50);
call symput ('q6', pct60);
call symput ('q7', pct70);
call symput ('q8', pct80);
call symput ('q9', pct90);
run;

/*Create a new variable containing the DECILES*/
data data_families2;
set data_families1;
if PPI=. then PPI_quint=.;
else if PPI <=&q1 then PPI_quint=1;
else if PPI<=&q2 then PPI_quint=2;
else if PPI<=&q3 then PPI_quint=3;
else if PPI<=&q4 then PPI_quint=4;

else if PPI<=&q5 then PPI_quint=5;
else if PPI<=&q6 then PPI_quint=6;
else if PPI<=&q7 then PPI_quint=7;
else if PPI<=&q8 then PPI_quint=8;
else if PPI<=&q9 then PPI_quint=9;

else PPI_quint=10;
run;


/*Test to make sure it worked*/
proc means data=data_families2 missing;
class PPI_quint;
var PPI;
run;
Super Contributor
Posts: 338

Re: Categorizing an Income Series into Deci

Hi Cynthia,

Sorry, although I pasted the complete program again in the window above, it is truncated automatically. So, only a part is shown.
Super Contributor
Posts: 338

Re: Categorizing an Income Series into Deci

Hi data_null_,

Thank you very much for these codes.

They worked correctly.



Hi Cynthia,

Thank you as well for your support.

Mirisage
Valued Guide
Posts: 2,177

Re: Categorizing an Income Series into Deci

> Hello Colleagues,
>
> I am trying to split the values of income variable “PPI” below into 10 parts (deciles). PPI has over 10,000 records.
>
> I attempted the program indicated below but all values for the newly created ‘PPI_decile’ variable turn out to be “10” whereas it should have been 1, 2,3,4, 5, 6, 7,8, 9 and 10.
>
> I wonder if anyone of you could help me to revise this program or suggest a more efficient approach.
>
>
>
> Data data_families1;
> Input ID PPI;
> Cards;
> 1 750
> 2 800
> 3 850
> 4 950
> 5 1250
> 6 1500
> 7 .
> 8 1600
> 9 1700
> 10 1850
> 11 2500
> 12 750
> 13 2100
> 14 2500
> 15 3750
> 16 .
> 17 750
> ;
> Run;
>
>
> /*DECILE CALCULATIONS*/
> proc univariate data=data_families1;
> var PPI;
> output out=decile pctlpts=10 20 30 40 50 60 70 80 90
> pctlpre=pct;
> run;
>
> /*Write the cut points to macro variable*/
> data _null_;
> set data_families1;
> call symput ('q1', pct10);
> call symput ('q2', pct20);
> call symput ('q3', pct30);
> call symput ('q4', pct40);
> call symput ('q5', pct50);
> call symput ('q6', pct60);
> call symput ('q7', pct70);
> call symput ('q8', pct80);
> call symput ('q9', pct90);
> run;
>
> /*Creating a new variable containing the deciles*/
> data data_families2;
> set data_families1;
> if PPI=. then PPI_decile=.;
> else if PPI LE&q1 then PPI_decile=1;
> else if PPI LE &q2 then PPI_decile=2;
> else if PPI LE &q3 then PPI_decile=3;
> else if PPI LE &q4 then PPI_decile=4;
>
> else if PPI LE &q5 then PPI_decile=5;
> else if PPI LE &q6 then PPI_decile=6;
> else if PPI LE &q7 then PPI_decile=7;
> else if PPI LE &q8 then PPI_decile=8;
> else if PPI LE &q9 then PPI_decile=9;
> else PPI_decile=10;
> run;
>
>
>
> Thank you
>
> Mirisage

Mirisage
1
your program was entirely there, as quoting your message reveals it. It was just not displayed from the first ≤ . So in this response I've replaced those with ≤ and a ;

2
your problem was not because you used symput() instead of symputX() (although your choice does not help diagnose the problem)

3
your solution placed all into PPI_decile =10 because you loaded the macro variables from the input to proc univariate data=data_families1 instead of from the output dataset out=decile.

with that change, your solution works just fine.
Now [inappropriate question], is it better than the proc rank approach ?






answer ="special missing value" [question not applicable]
Super Contributor
Posts: 338

Re: Categorizing an Income Series into Deci

Hi Peter,

Thank you very much for this.

Mirisage
Ask a Question
Discussion stats
  • 9 replies
  • 714 views
  • 0 likes
  • 4 in conversation