I want to create tertiles to organize my data into categories. However, I want the tertiles to be made based off of the control group. I tried this code, but then when I run the proc freq the table only shows data when casecont_path=1. How do I create tertiles using one group, then apply it to both the cases and the controls?
proc rank data=work.adenovar groups=3;
where casecont_path=1;
var jointyears;
ranks jointyears_tert;
run;
/*categorical joint years*/
data work.adeno1;
set work.data1;
if hash=2 then jointyears_cat=0;
if jointyears_tert=0 then jointyears_cat=1;
if jointyears_tert=1 then jointyears_cat=2;
if jointyears_tert=2 then jointyears_cat=3;
run;
proc freq data=work.adeno1;
tables jointyears_cat*casecont_path/norow nopercent;
run;
Then run the PROC UNIVARITATE I showed, and do a PROC PRINT of the NEED dataset produced by it. If this were other, more standard percentiles, then you could drop the "NOPRINT" option and see the percentile reported by PROC UNIVARIATE.
Or, in the DATA step assigning tertiles, just add a
if _n_=1 then put (pctl:) (=);
statement after the "set need" statement.
============================== editted comments below ================================
This above was marked as a "solution", but it's really only the answer to a derivative question, and doesn't really connect with the subject line. So I've included below my original response, which will give context to the above:
proc univariate data=work.adenovar noprint;;
where casecont_path=1;
var jointyears;
output out=need pctlpts=33.33 66.67 pctlpre=pctl ;
run;
/*categorical joint years*/
data work.adeno1 (drop=pctl33_33 pctl_66_67);
set work.data1;
if _n_=1 then set need;
if _n_=1 then put (pctl:) (=); /* Added in response to follow-up*/
if hash=2 then jointyears_cat=0; else
jointyears=1 + (jointyears>pctl33_33) + (jointyears>pctl66_67);
run;
where casecont_path=1;
so when you get to PROC FREQ, it never finds any other value for variable CASECONT_PATH.
A useful debugging method is for you to actually look at the data sets with your own eyes and see what is in there.
Why not use proc univariate to make two cut-points, at the 33.33%ile and 66.67%ile values? This is a good way to avoid holes from the proc rank on a subset. (edit: failed to paste this additional text).
proc univariate data=work.adenovar noprint;;
where casecont_path=1;
var jointyears;
output out=need pctlpts=33.33 66.67 pctlpre=pctl ;
run;
/*categorical joint years*/
data work.adeno1 (drop=pctl33_33 pctl_66_67);
set work.data1;
if _n_=1 then set need;
if hash=2 then jointyears_cat=0; else
jointyears=1 + (jointyears>pctl33_33) + (jointyears>pctl66_67);
run;
This code
data work.adeno1; set work.data1; if hash=2 then jointyears_cat=0; if jointyears_tert=0 then jointyears_cat=1; if jointyears_tert=1 then jointyears_cat=2; if jointyears_tert=2 then jointyears_cat=3; run;
is very likely overwriting the value of jointyears_cat set when hash=2 unless "joint_years_tert" is missing.
You might be wanting
data work.adeno1; set work.data1; if hash=2 then jointyears_cat=0; Else Jointyears_cat = jointyears_tert +1; run;
Be prepared to provide example data and expected result for that example.
The "application" to other variables can be extremely data dependendent
After some more thought, I think I'm asking the wrong question. I know how to divide up my data into tertiles, and I now understand how to do it for just my control group. What I really need to know now is how to see what each tertile actually is, like it's numerical value, so I can use it in making categories.
Then run the PROC UNIVARITATE I showed, and do a PROC PRINT of the NEED dataset produced by it. If this were other, more standard percentiles, then you could drop the "NOPRINT" option and see the percentile reported by PROC UNIVARIATE.
Or, in the DATA step assigning tertiles, just add a
if _n_=1 then put (pctl:) (=);
statement after the "set need" statement.
============================== editted comments below ================================
This above was marked as a "solution", but it's really only the answer to a derivative question, and doesn't really connect with the subject line. So I've included below my original response, which will give context to the above:
proc univariate data=work.adenovar noprint;;
where casecont_path=1;
var jointyears;
output out=need pctlpts=33.33 66.67 pctlpre=pctl ;
run;
/*categorical joint years*/
data work.adeno1 (drop=pctl33_33 pctl_66_67);
set work.data1;
if _n_=1 then set need;
if _n_=1 then put (pctl:) (=); /* Added in response to follow-up*/
if hash=2 then jointyears_cat=0; else
jointyears=1 + (jointyears>pctl33_33) + (jointyears>pctl66_67);
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.
Find more tutorials on the SAS Users YouTube channel.