Hi everyone,
My data is in this format:
category_1 category_2 result
1 a 10
2 a 20
1 b 30
2 b 40
I would like to create a variable that would give me a/b for each category_1. For instance if this new variable was called new_variable, the dataset would now look like this:
category_1 category_2 result new_variable
1 a 10 0.333333
2 a 20 0.5
1 b 30 .
2 b 40 .ariable
Sounds like a simple merge will do what you want. Just take the observation for the control group and merge it onto the rest of the observations.
category_1 category_2 result
1 a 10
2 a 20
1 b 30
2 b 40
CATEGORY_1 is your BY group for the merge. CATEGORY_2 is your treatment group and will be used to subset the observations .
data want ;
merge have (where=(category_2 ne 'b') )
have (rename=(category_2=trt result=control) where=(trt='b') )
;
by category_1 ;
new_variable = result / control ;
run;
Hi.
this code should get you somehow started.
It is a bit confusing to me how you want the new_variable to be displayed at the end, how did you chose it to show on the first two rows?
Anyway, take a look.
data in;
input category_1 category_2 $ result;
datalines;
1 a 10
2 a 20
1 b 30
2 b 40
;
run;
proc sort data = in;by category_1 category_2;
proc transpose data = in out = want(drop = _NAME_);
by category_1 ;id category_2;
var result;
run;
data really_want;
set want;
new_variable = a/b;
run;
data final;
merge in really_want(rename = (a = result) drop = b);
by category_1 result;
run;
Good luck!
Anca.
Given what you've explained a proc sql step will work, but order of the data matters and the a/b are hardcoded so Anca solution may be preferable, or you need to provide additional details.
data have;
input category_1 category_2 $ result;
cards;
1 a 10
2 a 20
1 b 30
2 b 40
;
run;
proc sql;
create table want as
select a.*, a.result/b.result as new_variable
from have a
left join have b
on a.category_1=b.category_1
and a.category_2='a'
and b.category_2='b'
order by category_2, category_1;
quit;
Hi,
To answer your question Anca, I would want new_variable to be a percentage of a single variable type (basically the control). I guess here's a simpler way to present it:
category_1 treatment result
1 a 10
2 a 20
1 control 30
2 control 40
1 c 50
2 c 60
The resulting dataset I would like to have would have percentages of the control for each category_1, so a/control and c/control for both 1 and 2:
category_1 treatment result new_variable
1 a 10 .333333
2 a 20 .5
1 control 30 1
2 control 40 1
1 c 50 1.666667
2 c 60 1.5
I tried your code and was able to rearrange the data, but not obtain the new variable (% of control)
This code worked, but I ran into problems when using it on my real dataset, in which both category_1 and category_2 are numerical variables (instead of category_2 having letters like a and b). Replacing a and b by the actual 1 and 2 seems to cause a bunch of errors. Is there an option that will read numbers as well as letters?
Here is what the code now looks like, where a is replaced by 3 and b is replaced by 4
data have;
input category_1 category_2 $ result;
cards;
1 3 10
2 3 20
1 4 30
2 4 40
;
run;
proc sql;
create table want as
select 3.*, 3.result/4.result as new_variable
from have 3
left join have 4
on 3.hour=4.hour
and 3.category_2='3'
and 4.category_2='4'
order by category_2, category_1;
quit;
Remove the quotation marks from the numbers.
I tried removing the quotation marks, but the problem seems to appear earlier, at the select 3.* phase (line 3 of proc sql). Here a.* is processed just fine but 3.* is not.
those a and b don't relate to the data, they're table alias, a shortcut way to reference tables. A/B are just convenient, though in this case confusing
proc sql;
create table want as
select a.*, a.result/b.result as new_variable
from have a
left join have b
on a.hour=b.hour
and a.category_2=3
and b.category_2=4
order by category_2, category_1;
quit;
Great! I just have one last question: how can I modify this code to work with more variables in category_2: for instance instead of 3 and 4 if I head 3 4 5 and 6 but still wanted the result of all of them divided by the result for variable 4? I imagine I would need to add:
and c.category_2=5
and d.category_2=6
but I can't get the syntax before that to work...
If you want to include the values where category_2 = "b" in your output with missing values for new_variable, the SQL solution must resort to the rarely used full join :
data have;
input category_1 category_2 $ result;
cards;
1 a 10
2 a 20
1 b 30
2 b 40
;
proc sql;
create table want as
select
a.*,
a.result/(b.result*(a.category_2 ne "b")) as new_variable
from
have as a full join
have as b on a.category_1=b.category_1
where b.category_2 = "b"
order by a.category_2, a.category_1;
select * from want;
quit;
PG
Message was edited by: PG Changed = "a" to ne "b"
Sounds like a simple merge will do what you want. Just take the observation for the control group and merge it onto the rest of the observations.
category_1 category_2 result
1 a 10
2 a 20
1 b 30
2 b 40
CATEGORY_1 is your BY group for the merge. CATEGORY_2 is your treatment group and will be used to subset the observations .
data want ;
merge have (where=(category_2 ne 'b') )
have (rename=(category_2=trt result=control) where=(trt='b') )
;
by category_1 ;
new_variable = result / control ;
run;
This works perfectly! Thanks!
Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.
Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.