Hi,
I would appreciate if someone could help me with the SAS code to create tertiles (low income, medium income, high income) for dataset subjinc below and count the number of ca case pop cont and can cont. The SAS code and log are shown below: Table 1 output is attached.
My ultimate aim is to count the number of ca case, pop cont and can cont for low income (showing income range), medium income (showing income range) and and high income (showing income range).
Any help, please?
Thanks.
ak.
/* Suject income*/
data subjinc;
input id$ 1-5 job 7 income lung$ 15-22;
datalines;
OSa06 3 32356 Pop cont
OSa10 1 25689 Ca case
OSa11 1 18546 Ca case
OSa12 3 15756 Ca case
OSa13 3 14527 Ca case
OSa14 3 20458 Ca case
OSa29 2 56974 Ca cont
OSa30 4 34568 Ca cont
OSa31 4 69713 Ca cont
OSa32 1 12896 Ca case
OSa33 3 25684 Ca case
OSa34 3 78362 Ca case
OSa35 3 18998 Ca case
OSa54 3 26934 Pop cont
OSa73 3 65823 Pop cont
OSa86 3 23542 Ca cont
OSa93 3 14589 Pop cont
OSa94 3 14325 Pop cont
OSa95 4 45802 Pop cont
OSb02 1 27419 Ca cont
OSb16 1 15843 Ca case
OSb20 4 16560 Ca case
OSb38 3 21123 Ca cont
OSb39 3 19743 Ca cont
OSb58 1 36589 Pop cont
OSb70 1 32145 Ca cont
OSb71 2 19400 Ca cont
OSb72 3 21356 Ca cont
OSb73 4 70195 Ca cont
;
Something like this?
proc rank data=subjinc out=subjrnk groups=3;
var income;
ranks subjgrp;
run;
proc format;
value incgrp
0 = "low income"
1 = "medium income"
2 = "high income";
run;
proc sql;
alter table subjrnk
modify subjgrp format=incgrp.;
quit;
proc tabulate data=subjrnk;
var income;
class subjgrp;
table (min*format=dollar6.0 max*format=dollar6.0 n*format=4.0),subjgrp=""*income="";
run;
low income medium income high income Min $12896 $19400 $32145 Max $18998 $27419 $78362 N 9 10 10
You don't seem to actually need cutpoints, since PROC RANK has put your data into 3 roughly equal-size groups. Can't you just take the resulting ranked data set, and run a proc freq, cross-tabulating the variables of interest against rank2?
Now, if you actually want cutpoints, then PROC UNIVARIATE is the way to go, as in:
proc univariate data=subjinc noprint ;
var income;
output out=need pctlpts=33.3 66.7 pctlpre=pctl_ ;
run;
proc print;run;
which produces dataset NEED with one observations and two variables: PCTL_33_3=19400 and PCTL_66_7=32145.
You could then
data want;
set subjinc;
if _n_=1 then set need;
if .<income < pctl_33_3 then incgroup='Low'; else
if income < pctl)66_7 then incgroup='Mid'; else
incgroup='Hi';
run;
Something like this?
proc rank data=subjinc out=subjrnk groups=3;
var income;
ranks subjgrp;
run;
proc format;
value incgrp
0 = "low income"
1 = "medium income"
2 = "high income";
run;
proc sql;
alter table subjrnk
modify subjgrp format=incgrp.;
quit;
proc tabulate data=subjrnk;
var income;
class subjgrp;
table (min*format=dollar6.0 max*format=dollar6.0 n*format=4.0),subjgrp=""*income="";
run;
low income medium income high income Min $12896 $19400 $32145 Max $18998 $27419 $78362 N 9 10 10
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.