A sample dataset is listed below:
data _null_;
input V $ W $ X Y date;
datalines;
A 0 200 600 01/01/20
A 0 300 500 02/01/20
A 1 400 700 01/01/20
A 1 100 800 02/01/20
B 0 200 600 01/01/20
B 0 300 500 02/01/20
B 1 400 700 01/01/20
B 1 100 800 02/01/20
Z 0 80 500 01/01/20
Z 0 50 600 02/01/20
;
run;
I am trying to create a new variable by subtracting two variables from one another and then dividing by another variable in the first variables observation (i.e. [x1-x2]/y1). I also need to match the observations using the same date. I'll be using the same set of observations to subtract with but many more observations for the other two numbers in the equation. How can I apply the above equation to create a new variable for my entire dataset? If we look at the second line in the sample dataset, the new variables will equal 0.5 since the date matches the second observation of Z (300-50/500). I will not need the Z observations in the new dataset.
I think I see what you want here. Something like this:
data have;
input V $ W $ X Y date:mmddyy8.;
format date yymmddd10.;
datalines;
A 0 200 600 01/01/20
A 0 300 500 02/01/20
A 1 400 700 01/01/20
A 1 100 800 02/01/20
B 0 200 600 01/01/20
B 0 300 500 02/01/20
B 1 400 700 01/01/20
B 1 100 800 02/01/20
Z 0 80 500 01/01/20
Z 0 50 600 02/01/20
;
proc sql;
select a.*, round((a.x-z.x)/a.y,.1) as NewVar
from
(select * from have where v<>'Z') as a
left join
(select * from have where v='Z') as z
on a.Date=z.Date
;
quit;
V | W | X | Y | date | NewVar |
---|---|---|---|---|---|
A | 1 | 400 | 700 | 2020-01-01 | 0.5 |
B | 0 | 200 | 600 | 2020-01-01 | 0.2 |
B | 1 | 400 | 700 | 2020-01-01 | 0.5 |
A | 0 | 200 | 600 | 2020-01-01 | 0.2 |
B | 0 | 300 | 500 | 2020-02-01 | 0.5 |
A | 0 | 300 | 500 | 2020-02-01 | 0.5 |
A | 1 | 100 | 800 | 2020-02-01 | 0.1 |
B | 1 | 100 | 800 | 2020-02-01 | 0.1 |
This would have been easier if you had two separate tables to start with:
data a z;
input V $ W $ X Y date:mmddyy8.;
format date yymmddd10.;
if v='Z' then output z;
else output a;
datalines;
A 0 200 600 01/01/20
A 0 300 500 02/01/20
A 1 400 700 01/01/20
A 1 100 800 02/01/20
B 0 200 600 01/01/20
B 0 300 500 02/01/20
B 1 400 700 01/01/20
B 1 100 800 02/01/20
Z 0 80 500 01/01/20
Z 0 50 600 02/01/20
;
proc sql;
select a.*, round((a.x-z.x)/a.y,.1) as NewVar
from a left join z
on a.Date=z.Date
;
quit;
V | W | X | Y | date | NewVar |
---|---|---|---|---|---|
A | 1 | 400 | 700 | 2020-01-01 | 0.5 |
B | 0 | 200 | 600 | 2020-01-01 | 0.2 |
B | 1 | 400 | 700 | 2020-01-01 | 0.5 |
A | 0 | 200 | 600 | 2020-01-01 | 0.2 |
B | 0 | 300 | 500 | 2020-02-01 | 0.5 |
A | 0 | 300 | 500 | 2020-02-01 | 0.5 |
A | 1 | 100 | 800 | 2020-02-01 | 0.1 |
B | 1 | 100 | 800 | 2020-02-01 | 0.1 |
There is no x1 or x2 in your data, but your formula contains x1 and x2. There is no Z in your data set, but your description includes a variable Z.
You say you want to "match the observations using the same date", this doesn't really mean anything to me.
Please explain in a lot more detail. Please show us the desired output for this small data set.
First thing: you data set does not read the date as shown. You need to provide an informat that will treat the values as a date (and then assign a format so humans can read it as needed). Unfortunately the values you picked for an example we cannot tell if that is month/day/year or day/month/ year (and having worked on Y2K problems I detest any value with 2 digit years). And since the name of the set you provide is _null_ then no set is made anyway so nothing can be used for later steps.
This gives a usable set:
data have; input V $ W $ X Y date :mmddyy8.; format date date9.; datalines; A 0 200 600 01/01/20 A 0 300 500 02/01/20 A 1 400 700 01/01/20 A 1 100 800 02/01/20 B 0 200 600 01/01/20 B 0 300 500 02/01/20 B 1 400 700 01/01/20 B 1 100 800 02/01/20 Z 0 80 500 01/01/20 Z 0 50 600 02/01/20 ; run;
Now, very carefully explain exactly what an X1 X2 and Y1 are as those are not variables in the data as shown.
Does "first" have anything to do with the values of V or W?
Match what on date? That you will have to show some worked example with some very detailed explanation as you only provided 2 dates across 10 observations. The date on your "second line" also matches the date on the 4th line, 6th line and 8th line as well as the one on Z. So the "example" calculation does not really make any sense without a lot more details.
I think I see what you want here. Something like this:
data have;
input V $ W $ X Y date:mmddyy8.;
format date yymmddd10.;
datalines;
A 0 200 600 01/01/20
A 0 300 500 02/01/20
A 1 400 700 01/01/20
A 1 100 800 02/01/20
B 0 200 600 01/01/20
B 0 300 500 02/01/20
B 1 400 700 01/01/20
B 1 100 800 02/01/20
Z 0 80 500 01/01/20
Z 0 50 600 02/01/20
;
proc sql;
select a.*, round((a.x-z.x)/a.y,.1) as NewVar
from
(select * from have where v<>'Z') as a
left join
(select * from have where v='Z') as z
on a.Date=z.Date
;
quit;
V | W | X | Y | date | NewVar |
---|---|---|---|---|---|
A | 1 | 400 | 700 | 2020-01-01 | 0.5 |
B | 0 | 200 | 600 | 2020-01-01 | 0.2 |
B | 1 | 400 | 700 | 2020-01-01 | 0.5 |
A | 0 | 200 | 600 | 2020-01-01 | 0.2 |
B | 0 | 300 | 500 | 2020-02-01 | 0.5 |
A | 0 | 300 | 500 | 2020-02-01 | 0.5 |
A | 1 | 100 | 800 | 2020-02-01 | 0.1 |
B | 1 | 100 | 800 | 2020-02-01 | 0.1 |
This would have been easier if you had two separate tables to start with:
data a z;
input V $ W $ X Y date:mmddyy8.;
format date yymmddd10.;
if v='Z' then output z;
else output a;
datalines;
A 0 200 600 01/01/20
A 0 300 500 02/01/20
A 1 400 700 01/01/20
A 1 100 800 02/01/20
B 0 200 600 01/01/20
B 0 300 500 02/01/20
B 1 400 700 01/01/20
B 1 100 800 02/01/20
Z 0 80 500 01/01/20
Z 0 50 600 02/01/20
;
proc sql;
select a.*, round((a.x-z.x)/a.y,.1) as NewVar
from a left join z
on a.Date=z.Date
;
quit;
V | W | X | Y | date | NewVar |
---|---|---|---|---|---|
A | 1 | 400 | 700 | 2020-01-01 | 0.5 |
B | 0 | 200 | 600 | 2020-01-01 | 0.2 |
B | 1 | 400 | 700 | 2020-01-01 | 0.5 |
A | 0 | 200 | 600 | 2020-01-01 | 0.2 |
B | 0 | 300 | 500 | 2020-02-01 | 0.5 |
A | 0 | 300 | 500 | 2020-02-01 | 0.5 |
A | 1 | 100 | 800 | 2020-02-01 | 0.1 |
B | 1 | 100 | 800 | 2020-02-01 | 0.1 |
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.