> Hello Colleagues,
>
> I am trying to split the values of income variable “PPI” below into 10 parts (deciles). PPI has over 10,000 records.
>
> I attempted the program indicated below but all values for the newly created ‘PPI_decile’ variable turn out to be “10” whereas it should have been 1, 2,3,4, 5, 6, 7,8, 9 and 10.
>
> I wonder if anyone of you could help me to revise this program or suggest a more efficient approach.
>
>
>
> Data data_families1;
> Input ID PPI;
> Cards;
> 1 750
> 2 800
> 3 850
> 4 950
> 5 1250
> 6 1500
> 7 .
> 8 1600
> 9 1700
> 10 1850
> 11 2500
> 12 750
> 13 2100
> 14 2500
> 15 3750
> 16 .
> 17 750
> ;
> Run;
>
>
> /*DECILE CALCULATIONS*/
> proc univariate data=data_families1;
> var PPI;
> output out=decile pctlpts=10 20 30 40 50 60 70 80 90
> pctlpre=pct;
> run;
>
> /*Write the cut points to macro variable*/
> data _null_;
> set data_families1;
> call symput ('q1', pct10);
> call symput ('q2', pct20);
> call symput ('q3', pct30);
> call symput ('q4', pct40);
> call symput ('q5', pct50);
> call symput ('q6', pct60);
> call symput ('q7', pct70);
> call symput ('q8', pct80);
> call symput ('q9', pct90);
> run;
>
> /*Creating a new variable containing the deciles*/
> data data_families2;
> set data_families1;
> if PPI=. then PPI_decile=.;
> else if PPI LE&q1 then PPI_decile=1;
> else if PPI LE &q2 then PPI_decile=2;
> else if PPI LE &q3 then PPI_decile=3;
> else if PPI LE &q4 then PPI_decile=4;
>
> else if PPI LE &q5 then PPI_decile=5;
> else if PPI LE &q6 then PPI_decile=6;
> else if PPI LE &q7 then PPI_decile=7;
> else if PPI LE &q8 then PPI_decile=8;
> else if PPI LE &q9 then PPI_decile=9;
> else PPI_decile=10;
> run;
>
>
>
> Thank you
>
> Mirisage
Mirisage
1
your program was entirely there, as quoting your message reveals it. It was just not displayed from the first ≤ . So in this response I've replaced those with ≤ and a ;
2
your problem was not because you used symput() instead of symputX() (although your choice does not help diagnose the problem)
3
your solution placed all into PPI_decile =10 because you loaded the macro variables from the input to proc univariate data=data_families1 instead of from the output dataset out=decile.
with that change, your solution works just fine.
Now [inappropriate question], is it better than the proc rank approach ?
answer ="special missing value" [question not applicable]