Hi,
I have data composed of an ID variable, a continuous variable (values 1-100), and a categorical variable (200 possible values), like this:
ID cont_var cat_var
1 30 1
2 25 2
3 42 2
4 97 1
5 55 1
6 12 2
The data are unique at the ID level.
What I want is to add a variable that summarizes the median of the cont_var by level of cat_var, like this:
ID cont_var cat_var new_var
1 30 1 55
2 25 2 25
3 42 2 25
4 97 1 55
5 55 1 55
6 12 2 25
I know how to do this by collapsing the data to levels of the categorical variable, but since I ultimately want to keep my dataset at the individual ID level, I'm hoping there's a way to do it without collapsing and rejoining.
Any help is much appreciated.
If you have 9.4, then:
data have;
input ID cont_var cat_var;
cards;
1 30 1
2 25 2
3 42 2
4 97 1
5 55 1
6 12 2
;
proc sql;
create table want as
select *, median(cont_var) as median from have group by cat_var
order by id;
quit;
If you have 9.4, then:
data have;
input ID cont_var cat_var;
cards;
1 30 1
2 25 2
3 42 2
4 97 1
5 55 1
6 12 2
;
proc sql;
create table want as
select *, median(cont_var) as median from have group by cat_var
order by id;
quit;
proc summary data=have nway;
class cat_var;
var cont_var;
output out= med median=;
run;
proc sql;
create table want as
select a.*, b.cont_var as new var
from have as a left join med as b on
a.cat_var=b.cat_var
order by ID;
quit;
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.