Solved: Re: SAS code to create cutpoints

ak2011 · Posted 02-08-2020 01:55 PM

Hi,

I would appreciate if someone could help me with the SAS code to create tertiles (low income, medium income, high income) for dataset subjinc below and count the number of ca case pop cont and can cont. The SAS code and log are shown below: Table 1 output is attached.

My ultimate aim is to count the number of ca case, pop cont and can cont for low income (showing income range), medium income (showing income range) and and high income (showing income range).

Any help, please?

Thanks.

ak.


/*  Suject income*/
data subjinc;
input id$ 1-5 job 7 income lung$ 15-22;
datalines;
OSa06 3 32356 Pop cont
OSa10 1	25689 Ca case
OSa11 1	18546 Ca case
OSa12 3	15756 Ca case
OSa13 3	14527 Ca case
OSa14 3	20458 Ca case
OSa29 2	56974 Ca cont
OSa30 4	34568 Ca cont
OSa31 4	69713 Ca cont
OSa32 1	12896 Ca case
OSa33 3	25684 Ca case
OSa34 3	78362 Ca case
OSa35 3	18998 Ca case
OSa54 3	26934 Pop cont
OSa73 3	65823 Pop cont
OSa86 3	23542 Ca cont
OSa93 3	14589 Pop cont
OSa94 3	14325 Pop cont
OSa95 4	45802 Pop cont
OSb02 1	27419 Ca cont
OSb16 1	15843 Ca case
OSb20 4	16560 Ca case
OSb38 3	21123 Ca cont
OSb39 3	19743 Ca cont
OSb58 1	36589 Pop cont
OSb70 1	32145 Ca cont
OSb71 2	19400 Ca cont
OSb72 3	21356 Ca cont
OSb73 4	70195 Ca cont
;

1 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;

72

73 /* Suject income*/

74 data subjinc;

75 input id$ 1-5 job 7 income lung$ 15-22;

76 datalines;

NOTE: The data set WORK.SUBJINC has 29 observations and 4 variables.

NOTE: DATA statement used (Total process time):

real time 0.01 seconds

cpu time 0.00 seconds

106 ;

107

108 /* TASK 1: Creating cutpoints in entire dataset*/

109

110 data cutp; set subjinc;

111

NOTE: There were 29 observations read from the data set WORK.SUBJINC.

NOTE: The data set WORK.CUTP has 29 observations and 4 variables.

NOTE: DATA statement used (Total process time):

real time 0.01 seconds

cpu time 0.01 seconds

112 proc rank data=cutp out=cutp2 groups=3;

113 var income;

114 ranks rank2;

115 run;

NOTE: The data set WORK.CUTP2 has 29 observations and 5 variables.

NOTE: PROCEDURE RANK used (Total process time):

real time 0.01 seconds

cpu time 0.02 seconds

116

117 proc print data= cutp2;

118 title ' Table 1: Subject income ranks';

119 run;

NOTE: There were 29 observations read from the data set WORK.CUTP2.

NOTE: PROCEDURE PRINT used (Total process time):

real time 0.34 seconds

cpu time 0.34 seconds

120

121

122 data cutp3; set cutp2;

123 if rank2 = 0 then incomerank ='low';

124 if rank2 = 1 then incomerank ='medium';

125 if rank2 = 2 then incomerank ='high';

126 run;

NOTE: There were 29 observations read from the data set WORK.CUTP2.

NOTE: The data set WORK.CUTP3 has 29 observations and 6 variables.

NOTE: DATA statement used (Total process time):

real time 0.01 seconds

cpu time 0.01 seconds

127

128 /*TASK 2: COUNTING ca case, pop cont and can cont with dollar amount range:

129

130 low income range(.......), medium income range (.....), high income range (....)*/

131

132

133

134 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;

146

PGStats · Posted 02-08-2020 03:00 PM

Something like this?

proc rank data=subjinc out=subjrnk groups=3;
var income;
ranks subjgrp;
run;

proc format;
value incgrp
0 = "low income"
1 = "medium income"
2 = "high income";
run;

proc sql;
alter table subjrnk
modify subjgrp format=incgrp.;
quit;

proc tabulate data=subjrnk;
var income;
class subjgrp;
table (min*format=dollar6.0 max*format=dollar6.0 n*format=4.0),subjgrp=""*income="";
run;

  	low income 	medium income 	high income
Min 	$12896 	$19400 	$32145
Max 	$18998 	$27419 	$78362
N 	9 	10 	10

PG

View solution in original post

mkeintz · Posted 02-08-2020 02:52 PM

You don't seem to actually need cutpoints, since PROC RANK has put your data into 3 roughly equal-size groups. Can't you just take the resulting ranked data set, and run a proc freq, cross-tabulating the variables of interest against rank2?

Now, if you actually want cutpoints, then PROC UNIVARIATE is the way to go, as in:

proc univariate data=subjinc noprint ;
  var income;
  output out=need pctlpts=33.3 66.7  pctlpre=pctl_ ;
run;
proc print;run;

which produces dataset NEED with one observations and two variables: PCTL_33_3=19400 and PCTL_66_7=32145.

You could then

data want;
 set subjinc;
 if _n_=1 then set need;
 if .<income < pctl_33_3 then incgroup='Low'; else 
 if   income < pctl)66_7 then incgroup='Mid'; else
 incgroup='Hi';
run;

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

ak2011 · Posted 03-05-2020 09:47 PM

Hi, you helped me sometime ago with the Sas code for cut points and thank you for that. However, the "data want' portion of the SAS code to create percentiile ranking(33.3 and 66.7) does not work. I would appreciate if you could help me with corrections. I tried several things to correct the error but it still did not work. Thank you in advance.

PGStats · Posted 02-08-2020 03:00 PM

Something like this?

proc rank data=subjinc out=subjrnk groups=3;
var income;
ranks subjgrp;
run;

proc format;
value incgrp
0 = "low income"
1 = "medium income"
2 = "high income";
run;

proc sql;
alter table subjrnk
modify subjgrp format=incgrp.;
quit;

proc tabulate data=subjrnk;
var income;
class subjgrp;
table (min*format=dollar6.0 max*format=dollar6.0 n*format=4.0),subjgrp=""*income="";
run;

  	low income 	medium income 	high income
Min 	$12896 	$19400 	$32145
Max 	$18998 	$27419 	$78362
N 	9 	10 	10

PG

SAS code to create cutpoints

Re: SAS code to create cutpoints

Re: SAS code to create cutpoints

Re: SAS code to create cutpoints

Re: SAS code to create cutpoints

SAS Innovate 2025: Register Now

SAS Training: Just a Click Away