BookmarkSubscribeRSS Feed
Mirisage
Obsidian | Level 7
Hello Colleagues,

I am trying to split the values of income variable “PPI” below into 10 parts (deciles). PPI has over 10,000 records.

I attempted the program indicated below but all values for the newly created ‘PPI_decile’ variable turn out to be “10” whereas it should have been 1, 2,3,4, 5, 6, 7,8, 9 and 10.

I wonder if anyone of you could help me to revise this program or suggest a more efficient approach.



Data data_families1;
Input ID PPI;
Cards;
1 750
2 800
3 850
4 950
5 1250
6 1500
7 .
8 1600
9 1700
10 1850
11 2500
12 750
13 2100
14 2500
15 3750
16 .
17 750
;
Run;


/*DECILE CALCULATIONS*/
proc univariate data=data_families1;
var PPI;
output out=decile pctlpts=10 20 30 40 50 60 70 80 90 pctlpre=pct;
run;

/*Write the cut points to macro variable*/
data _null_;
set data_families1;
call symput ('q1', pct10);
call symput ('q2', pct20);
call symput ('q3', pct30);
call symput ('q4', pct40);
call symput ('q5', pct50);
call symput ('q6', pct60);
call symput ('q7', pct70);
call symput ('q8', pct80);
call symput ('q9', pct90);
run;

/*Creating a new variable containing the deciles*/
data data_families2;
set data_families1;
if PPI=. then PPI_decile=.;
else if PPI <=&q1 then PPI_decile=1;
else if PPI<=&q2 then PPI_decile=2;
else if PPI<=&q3 then PPI_decile=3;
else if PPI<=&q4 then PPI_decile=4;

else if PPI<=&q5 then PPI_decile=5;
else if PPI<=&q6 then PPI_decile=6;
else if PPI<=&q7 then PPI_decile=7;
else if PPI<=&q8 then PPI_decile=8;
else if PPI<=&q9 then PPI_decile=9;
else PPI_decile=10;
run;



Thank you

Mirisage
9 REPLIES 9
Cynthia_sas
SAS Super FREQ
Hi:
Did your code example get cut off??? Remember that if your code contains < or > symbols, you need to "protect" them, as described in this previous forum posting:
http://support.sas.com/forums/thread.jspa?messageID=27609毙

cynthia
Mirisage
Obsidian | Level 7
Hi Cynthia,

No, it did not get cut off.

So, the question is how to split the PPI variable below into 10 parts (deciles) using SAS.

Data data_families1;
Input ID PPI;
Cards;
1 750
2 800
3 850
4 950
5 1250
6 1500
7 .
8 1600
9 1700
10 1850
11 2500
12 750
13 2100
14 2500
15 3750
16 .
17 750
;
Run;
data_null__
Jade | Level 19
I think you want PROC RANK with the GROUPS option.

[pre]
Data data_families1;
Input ID PPI @@;
Cards;
1 750 2 800 3 850 4 950 5 1250 6 1500 7 .
8 1600 9 1700 10 1850 11 2500 12 750 13 2100
14 2500 15 3750 16 . 17 750
;
Run;
proc rank group=10 out=deciles;
var ppi;
ranks decile;
run;
proc print;
run;
[/pre]
Cynthia_sas
SAS Super FREQ
Oh, I just wondered because
[pre]
else if PPI
[/pre]

(which is where your post ends ... is not a complete, valid SAS statement).

It just seemed odd.

cynthia
Mirisage
Obsidian | Level 7
Hi Cynthia,

This is the complete program I attempted to split the above data set into deciles.

/*DECILE CALCULATIONS*/
proc univariate data=data_families1;
var PPI;
output out=percentile pctlpts=10 20 30 40 50 60 70 80 90 pctlpre=pct;
run;

/*Write the cutpoints to macro variables*/
data _null_;
set data_families1;
call symput ('q1', pct10);
call symput ('q2', pct20);
call symput ('q3', pct30);
call symput ('q4', pct40);
call symput ('q5', pct50);
call symput ('q6', pct60);
call symput ('q7', pct70);
call symput ('q8', pct80);
call symput ('q9', pct90);
run;

/*Create a new variable containing the DECILES*/
data data_families2;
set data_families1;
if PPI=. then PPI_quint=.;
else if PPI <=&q1 then PPI_quint=1;
else if PPI<=&q2 then PPI_quint=2;
else if PPI<=&q3 then PPI_quint=3;
else if PPI<=&q4 then PPI_quint=4;

else if PPI<=&q5 then PPI_quint=5;
else if PPI<=&q6 then PPI_quint=6;
else if PPI<=&q7 then PPI_quint=7;
else if PPI<=&q8 then PPI_quint=8;
else if PPI<=&q9 then PPI_quint=9;

else PPI_quint=10;
run;


/*Test to make sure it worked*/
proc means data=data_families2 missing;
class PPI_quint;
var PPI;
run;
Mirisage
Obsidian | Level 7
Hi Cynthia,

Sorry, although I pasted the complete program again in the window above, it is truncated automatically. So, only a part is shown.
Mirisage
Obsidian | Level 7
Hi data_null_,

Thank you very much for these codes.

They worked correctly.



Hi Cynthia,

Thank you as well for your support.

Mirisage
Peter_C
Rhodochrosite | Level 12
> Hello Colleagues,
>
> I am trying to split the values of income variable “PPI” below into 10 parts (deciles). PPI has over 10,000 records.
>
> I attempted the program indicated below but all values for the newly created ‘PPI_decile’ variable turn out to be “10” whereas it should have been 1, 2,3,4, 5, 6, 7,8, 9 and 10.
>
> I wonder if anyone of you could help me to revise this program or suggest a more efficient approach.
>
>
>
> Data data_families1;
> Input ID PPI;
> Cards;
> 1 750
> 2 800
> 3 850
> 4 950
> 5 1250
> 6 1500
> 7 .
> 8 1600
> 9 1700
> 10 1850
> 11 2500
> 12 750
> 13 2100
> 14 2500
> 15 3750
> 16 .
> 17 750
> ;
> Run;
>
>
> /*DECILE CALCULATIONS*/
> proc univariate data=data_families1;
> var PPI;
> output out=decile pctlpts=10 20 30 40 50 60 70 80 90
> pctlpre=pct;
> run;
>
> /*Write the cut points to macro variable*/
> data _null_;
> set data_families1;
> call symput ('q1', pct10);
> call symput ('q2', pct20);
> call symput ('q3', pct30);
> call symput ('q4', pct40);
> call symput ('q5', pct50);
> call symput ('q6', pct60);
> call symput ('q7', pct70);
> call symput ('q8', pct80);
> call symput ('q9', pct90);
> run;
>
> /*Creating a new variable containing the deciles*/
> data data_families2;
> set data_families1;
> if PPI=. then PPI_decile=.;
> else if PPI LE&q1 then PPI_decile=1;
> else if PPI LE &q2 then PPI_decile=2;
> else if PPI LE &q3 then PPI_decile=3;
> else if PPI LE &q4 then PPI_decile=4;
>
> else if PPI LE &q5 then PPI_decile=5;
> else if PPI LE &q6 then PPI_decile=6;
> else if PPI LE &q7 then PPI_decile=7;
> else if PPI LE &q8 then PPI_decile=8;
> else if PPI LE &q9 then PPI_decile=9;
> else PPI_decile=10;
> run;
>
>
>
> Thank you
>
> Mirisage

Mirisage
1
your program was entirely there, as quoting your message reveals it. It was just not displayed from the first ≤ . So in this response I've replaced those with ≤ and a ;

2
your problem was not because you used symput() instead of symputX() (although your choice does not help diagnose the problem)

3
your solution placed all into PPI_decile =10 because you loaded the macro variables from the input to proc univariate data=data_families1 instead of from the output dataset out=decile.

with that change, your solution works just fine.
Now [inappropriate question], is it better than the proc rank approach ?






answer ="special missing value" [question not applicable]
Mirisage
Obsidian | Level 7
Hi Peter,

Thank you very much for this.

Mirisage

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 1698 views
  • 0 likes
  • 4 in conversation