BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
gejoachim99
Fluorite | Level 6

I want to create tertiles to organize my data into categories. However, I want the tertiles to be made based off of the control group. I tried this code, but then when I run the proc freq the table only shows data when casecont_path=1. How do I create tertiles using one group, then apply it to both the cases and the controls?

 

proc rank data=work.adenovar groups=3;
where casecont_path=1;
var jointyears;
ranks jointyears_tert;
run;

 

/*categorical joint years*/
data work.adeno1;
set work.data1;
if hash=2 then jointyears_cat=0;
if jointyears_tert=0 then jointyears_cat=1;
if jointyears_tert=1 then jointyears_cat=2;
if jointyears_tert=2 then jointyears_cat=3;
run;

 

proc freq data=work.adeno1;
tables jointyears_cat*casecont_path/norow nopercent;
run;

1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

Then run the PROC UNIVARITATE I showed, and do a PROC PRINT of the NEED dataset produced by it.   If this were other, more standard percentiles, then you could drop the "NOPRINT" option and see the percentile reported by PROC UNIVARIATE.  

 

Or, in the DATA step assigning tertiles, just add a

  if _n_=1 then put (pctl:) (=);

statement after the "set need" statement.

============================== editted comments below ================================

This above was marked as a "solution", but it's really only the answer to a derivative question, and doesn't really connect with the subject line.  So I've included below my original response, which will give context to the above:

 

proc univariate data=work.adenovar noprint;;
  where casecont_path=1;
  var jointyears;
  output out=need pctlpts=33.33 66.67 pctlpre=pctl ;
run;

/*categorical joint years*/
data work.adeno1 (drop=pctl33_33 pctl_66_67);
  set work.data1;
  if _n_=1 then set need; 
  if _n_=1 then put (pctl:) (=);   /* Added in response to follow-up*/
  if hash=2 then jointyears_cat=0; else
  jointyears=1 + (jointyears>pctl33_33) + (jointyears>pctl66_67);
run;
  1. Dataset need will have one observation with two variables PCTL33_33, and PCTL66_67.

  2. The "if _n_=1 then set need;" statement reads the single NEED observation during only the first iteration of the data step.  Otherwise it would attempt to read beyond the end of NEED, and the data step would stop prematurely.   And all the variables read by that conditional SET NEED statement will be retained for use in all subsequent obs as you progress through work.adeno1.
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

8 REPLIES 8
PaigeMiller
Diamond | Level 26
where casecont_path=1;

so when you get to PROC FREQ, it never finds any other value for variable CASECONT_PATH.

 

A useful debugging method is for you to actually look at the data sets with your own eyes and see what is in there.

 

--
Paige Miller
gejoachim99
Fluorite | Level 6
I understand that the where statement is why the proc freq isn't what I need, but I'm more asking if there is a way to create tertiles based on only that group, then apply that categorization to both my cases and controls. My data set is very large.
mkeintz
PROC Star

Why not use proc univariate to make two cut-points, at the 33.33%ile and 66.67%ile values?  This is a good way to avoid holes from the proc rank on a subset.  (edit: failed to paste this additional text).

 

proc univariate data=work.adenovar noprint;;
  where casecont_path=1;
  var jointyears;
  output out=need pctlpts=33.33 66.67 pctlpre=pctl ;
run;

/*categorical joint years*/
data work.adeno1 (drop=pctl33_33 pctl_66_67);
  set work.data1;
  if _n_=1 then set need; 
  if hash=2 then jointyears_cat=0; else
  jointyears=1 + (jointyears>pctl33_33) + (jointyears>pctl66_67);
run;

  1. Dataset need will have one observation with two variables PCTL33_33, and PCTL66_67.

  2. The "if _n_=1 then set need;" statement reads the single NEED observation during only the first iteration of the data step.  Otherwise it would attempt to read beyond the end of NEED, and the data step would stop prematurely.   And all the variables read by that conditional SET NEED statement will be retained for use in all subsequent obs as you progress through work.adeno1.
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
ballardw
Super User

This code

data work.adeno1;
set work.data1;
if hash=2 then jointyears_cat=0;
if jointyears_tert=0 then jointyears_cat=1;
if jointyears_tert=1 then jointyears_cat=2;
if jointyears_tert=2 then jointyears_cat=3;
run;

is very likely overwriting the value of jointyears_cat set when hash=2 unless "joint_years_tert" is missing.

You might be wanting

data work.adeno1;
   set work.data1;
   if hash=2 then jointyears_cat=0;
   Else Jointyears_cat = jointyears_tert +1;
run;

Be prepared to provide example data and expected result for that example.

The "application" to other variables can be extremely data dependendent

gejoachim99
Fluorite | Level 6
That part of the code isn't what I had a question on, I know it is correct. I was just including it for some context.
gejoachim99
Fluorite | Level 6

After some more thought, I think I'm asking the wrong question. I know how to divide up my data into tertiles, and I now understand how to do it for just my control group. What I really need to know now is how to see what each tertile actually is, like it's numerical value, so I can use it in making categories. 

mkeintz
PROC Star

Then run the PROC UNIVARITATE I showed, and do a PROC PRINT of the NEED dataset produced by it.   If this were other, more standard percentiles, then you could drop the "NOPRINT" option and see the percentile reported by PROC UNIVARIATE.  

 

Or, in the DATA step assigning tertiles, just add a

  if _n_=1 then put (pctl:) (=);

statement after the "set need" statement.

============================== editted comments below ================================

This above was marked as a "solution", but it's really only the answer to a derivative question, and doesn't really connect with the subject line.  So I've included below my original response, which will give context to the above:

 

proc univariate data=work.adenovar noprint;;
  where casecont_path=1;
  var jointyears;
  output out=need pctlpts=33.33 66.67 pctlpre=pctl ;
run;

/*categorical joint years*/
data work.adeno1 (drop=pctl33_33 pctl_66_67);
  set work.data1;
  if _n_=1 then set need; 
  if _n_=1 then put (pctl:) (=);   /* Added in response to follow-up*/
  if hash=2 then jointyears_cat=0; else
  jointyears=1 + (jointyears>pctl33_33) + (jointyears>pctl66_67);
run;
  1. Dataset need will have one observation with two variables PCTL33_33, and PCTL66_67.

  2. The "if _n_=1 then set need;" statement reads the single NEED observation during only the first iteration of the data step.  Otherwise it would attempt to read beyond the end of NEED, and the data step would stop prematurely.   And all the variables read by that conditional SET NEED statement will be retained for use in all subsequent obs as you progress through work.adeno1.
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
gejoachim99
Fluorite | Level 6
Thank you so much! That worked! 🙂

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
SAS Enterprise Guide vs. SAS Studio

What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 2613 views
  • 1 like
  • 4 in conversation