Hi,
I have data composed of an ID variable, a continuous variable (values 1-100), and a categorical variable (200 possible values), like this:
ID cont_var cat_var
1 30 1
2 25 2
3 42 2
4 97 1
5 55 1
6 12 2
The data are unique at the ID level.
What I want is to add a variable that summarizes the median of the cont_var by level of cat_var, like this:
ID cont_var cat_var new_var
1 30 1 55
2 25 2 25
3 42 2 25
4 97 1 55
5 55 1 55
6 12 2 25
I know how to do this by collapsing the data to levels of the categorical variable, but since I ultimately want to keep my dataset at the individual ID level, I'm hoping there's a way to do it without collapsing and rejoining.
Any help is much appreciated.
If you have 9.4, then:
data have;
input ID cont_var cat_var;
cards;
1 30 1
2 25 2
3 42 2
4 97 1
5 55 1
6 12 2
;
proc sql;
create table want as
select *, median(cont_var) as median from have group by cat_var
order by id;
quit;
If you have 9.4, then:
data have;
input ID cont_var cat_var;
cards;
1 30 1
2 25 2
3 42 2
4 97 1
5 55 1
6 12 2
;
proc sql;
create table want as
select *, median(cont_var) as median from have group by cat_var
order by id;
quit;
proc summary data=have nway;
class cat_var;
var cont_var;
output out= med median=;
run;
proc sql;
create table want as
select a.*, b.cont_var as new var
from have as a left join med as b on
a.cat_var=b.cat_var
order by ID;
quit;
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.