Hi,
I have two datasets that have different b values. I want the nearest c value from the second dataset to be used for my first dataset when using b between two dataset
For example:
Dataset 1
a | b |
1 | 702 |
1 | 788 |
1 | 1148 |
2 | 417 |
2 | 441 |
2 | 458 |
2 | 465 |
2 | 495 |
2 | 502 |
2 | 707 |
2 | 766 |
2 | 863 |
3 | 1547 |
4 | 977 |
4 | 1547 |
Dataset 2
a | b | c |
1 | 598 | 13.1 |
1 | 652 | 12.9 |
1 | 701 | 12.5 |
1 | 766 | 12.3 |
1 | 807 | 14.6 |
1 | 1095 | 10.1 |
1 | 1142 | . |
1 | 1219 | 9.7 |
2 | 295 | 16.1 |
2 | 305 | 18.1 |
2 | 362 | 16.4 |
2 | 565 | 16.9 |
2 | 642 | 18.4 |
2 | 740 | 22.5 |
2 | 834 | 15.5 |
2 | 922 | 17.8 |
2 | 1007 | 27.6 |
2 | 1090 | 13.9 |
2 | 1286 | 14.7 |
2 | 1568 | 15.1 |
2 | 1561 | 11.9 |
3 | 903 | 10.5 |
3 | 1011 | 7.8 |
3 | 1099 | 8 |
3 | 1561 | . |
3 | 1656 | 5.4 |
4 | 862 | 1.7 |
4 | 906 | 1.1 |
4 | 990 | 1.8 |
Final Dataset:
a | b | c |
1 | 702 | 12.5 |
1 | 788 | 14.6 |
1 | 1148 | 10.1 |
2 | 417 | 16.4 |
2 | 441 | 16.4 |
2 | 458 | 16.4 |
2 | 465 | 16.9 |
2 | 495 | 16.9 |
2 | 502 | 16.9 |
2 | 707 | 22.5 |
2 | 766 | 22.5 |
2 | 863 | 15.5 |
3 | 1547 | 5.4 |
4 | 977 | 1.8 |
4 | 1547 | 1.8 |
Any suggestions as to how I can create something similar to the final dataset given the values in dataset 1 and dataset 2?
Thank you in advance
This can get you started but doesn't handle the ties yet. Unfortunately I don't have time to resolve that, but I think you could figure it out or hopefully someone else can chime in. This only looks at absolute difference so it works for exactly the question asked but if you have ties you'll end up with duplicates in the output.
proc sql;
create table want as
select t1.a, t1.b, t2.c, abs(t1.b - t2.b) as diff
from data1 as t1
left join data2 as t2
on t1.a=t2.a
and not missing(t2.c)
group by t1.a, t1.b
having diff=min(diff)
;
quit;
I want the nearest c value from the second dataset to be used for my first dataset when using b between two dataset
What are the rules for nearest? Does it need to be above or below or does it not matter? If it doesn't matter, what happens if there's a tie and you have one with a value exactly below and above?
Depending on the criteria this could be a simple merge or something more complicated.
@Dregerator wrote:
Hi,
I have two datasets that have different b values. I want the nearest c value from the second dataset to be used for my first dataset when using b between two dataset
For example:
Dataset 1
a b 1 702 1 788 1 1148 2 417 2 441 2 458 2 465 2 495 2 502 2 707 2 766 2 863 3 1547 4 977 4 1547
Dataset 2
a b c 1 598 13.1 1 652 12.9 1 701 12.5 1 766 12.3 1 807 14.6 1 1095 10.1 1 1142 . 1 1219 9.7 2 295 16.1 2 305 18.1 2 362 16.4 2 565 16.9 2 642 18.4 2 740 22.5 2 834 15.5 2 922 17.8 2 1007 27.6 2 1090 13.9 2 1286 14.7 2 1568 15.1 2 1561 11.9 3 903 10.5 3 1011 7.8 3 1099 8 3 1561 . 3 1656 5.4 4 862 1.7 4 906 1.1 4 990 1.8
Final Dataset:
a b c 1 702 12.5 1 788 14.6 1 1148 10.1 2 417 16.4 2 441 16.4 2 458 16.4 2 465 16.9 2 495 16.9 2 502 16.9 2 707 22.5 2 766 22.5 2 863 15.5 3 1547 5.4 4 977 1.8 4 1547 1.8
Any suggestions as to how I can create something similar to the final dataset given the values in dataset 1 and dataset 2?
Thank you in advance
It does not matter...I would just like the closest value to dataset b because I'm trying to superimpose values in dataset a on a graph where dataset b is already displayed, but need the nearest/closest values of c so they are on top of each other. Not sure if this answers your question
Unfortunately not. What happens if there's a tie for the smallest difference and one is above and one is below? Then which value do you take?
@Dregerator wrote:
It does not matter...I would just like the closest value to dataset b because I'm trying to superimpose values in dataset a on a graph where dataset b is already displayed, but need the nearest/closest values of c so they are on top of each other. Not sure if this answers your question
Can we do one above for example purposes?
Data set A
a | b |
1 | 702 |
Dataset 2
a | b | c |
1 | 598 | 13.1 |
1 | 652 | 12.9 |
1 | 701 | 12.5 |
1 | 703 | 12.3 |
Now what should data set 3 look like?
It will be 12.5
a | b | c |
1 | 702 | 12.5 |
This can get you started but doesn't handle the ties yet. Unfortunately I don't have time to resolve that, but I think you could figure it out or hopefully someone else can chime in. This only looks at absolute difference so it works for exactly the question asked but if you have ties you'll end up with duplicates in the output.
proc sql;
create table want as
select t1.a, t1.b, t2.c, abs(t1.b - t2.b) as diff
from data1 as t1
left join data2 as t2
on t1.a=t2.a
and not missing(t2.c)
group by t1.a, t1.b
having diff=min(diff)
;
quit;
Here is an alternative
data one;
input a b;
datalines;
1 702
1 788
1 1148
2 417
2 441
2 458
2 465
2 495
2 502
2 707
2 766
2 863
3 1547
4 977
4 1547
;
data two;
input a b c;
datalines;
1 598 13.1
1 652 12.9
1 701 12.5
1 766 12.3
1 807 14.6
1 1095 10.1
1 1142 .
1 1219 9.7
2 295 16.1
2 305 18.1
2 362 16.4
2 565 16.9
2 642 18.4
2 740 22.5
2 834 15.5
2 922 17.8
2 1007 27.6
2 1090 13.9
2 1286 14.7
2 1568 15.1
2 1561 11.9
3 903 10.5
3 1011 7.8
3 1099 8
3 1561 .
3 1656 5.4
4 862 1.7
4 906 1.1
4 990 1.8
;
data want(keep = a b c);
if _N_ = 1 then do;
dcl hash h(dataset : "two(rename = b = bb where = (c))", multidata : "Y");
h.definekey("a");
h.definedata("bb", "c");
h.definedone();
end;
set one;
bb = .; c = .; con = constant("big");
do while (h.do_over() = 0);
if abs(bb - b) < con then do;
con = abs(bb - b);
cc = c;
end;
end;
c = cc;
run;
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.