- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I have two datasets that have different b values. I want the nearest c value from the second dataset to be used for my first dataset when using b between two dataset
For example:
Dataset 1
a | b |
1 | 702 |
1 | 788 |
1 | 1148 |
2 | 417 |
2 | 441 |
2 | 458 |
2 | 465 |
2 | 495 |
2 | 502 |
2 | 707 |
2 | 766 |
2 | 863 |
3 | 1547 |
4 | 977 |
4 | 1547 |
Dataset 2
a | b | c |
1 | 598 | 13.1 |
1 | 652 | 12.9 |
1 | 701 | 12.5 |
1 | 766 | 12.3 |
1 | 807 | 14.6 |
1 | 1095 | 10.1 |
1 | 1142 | . |
1 | 1219 | 9.7 |
2 | 295 | 16.1 |
2 | 305 | 18.1 |
2 | 362 | 16.4 |
2 | 565 | 16.9 |
2 | 642 | 18.4 |
2 | 740 | 22.5 |
2 | 834 | 15.5 |
2 | 922 | 17.8 |
2 | 1007 | 27.6 |
2 | 1090 | 13.9 |
2 | 1286 | 14.7 |
2 | 1568 | 15.1 |
2 | 1561 | 11.9 |
3 | 903 | 10.5 |
3 | 1011 | 7.8 |
3 | 1099 | 8 |
3 | 1561 | . |
3 | 1656 | 5.4 |
4 | 862 | 1.7 |
4 | 906 | 1.1 |
4 | 990 | 1.8 |
Final Dataset:
a | b | c |
1 | 702 | 12.5 |
1 | 788 | 14.6 |
1 | 1148 | 10.1 |
2 | 417 | 16.4 |
2 | 441 | 16.4 |
2 | 458 | 16.4 |
2 | 465 | 16.9 |
2 | 495 | 16.9 |
2 | 502 | 16.9 |
2 | 707 | 22.5 |
2 | 766 | 22.5 |
2 | 863 | 15.5 |
3 | 1547 | 5.4 |
4 | 977 | 1.8 |
4 | 1547 | 1.8 |
Any suggestions as to how I can create something similar to the final dataset given the values in dataset 1 and dataset 2?
Thank you in advance
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
This can get you started but doesn't handle the ties yet. Unfortunately I don't have time to resolve that, but I think you could figure it out or hopefully someone else can chime in. This only looks at absolute difference so it works for exactly the question asked but if you have ties you'll end up with duplicates in the output.
proc sql;
create table want as
select t1.a, t1.b, t2.c, abs(t1.b - t2.b) as diff
from data1 as t1
left join data2 as t2
on t1.a=t2.a
and not missing(t2.c)
group by t1.a, t1.b
having diff=min(diff)
;
quit;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I want the nearest c value from the second dataset to be used for my first dataset when using b between two dataset
What are the rules for nearest? Does it need to be above or below or does it not matter? If it doesn't matter, what happens if there's a tie and you have one with a value exactly below and above?
Depending on the criteria this could be a simple merge or something more complicated.
@Dregerator wrote:
Hi,
I have two datasets that have different b values. I want the nearest c value from the second dataset to be used for my first dataset when using b between two dataset
For example:
Dataset 1
a b 1 702 1 788 1 1148 2 417 2 441 2 458 2 465 2 495 2 502 2 707 2 766 2 863 3 1547 4 977 4 1547
Dataset 2
a b c 1 598 13.1 1 652 12.9 1 701 12.5 1 766 12.3 1 807 14.6 1 1095 10.1 1 1142 . 1 1219 9.7 2 295 16.1 2 305 18.1 2 362 16.4 2 565 16.9 2 642 18.4 2 740 22.5 2 834 15.5 2 922 17.8 2 1007 27.6 2 1090 13.9 2 1286 14.7 2 1568 15.1 2 1561 11.9 3 903 10.5 3 1011 7.8 3 1099 8 3 1561 . 3 1656 5.4 4 862 1.7 4 906 1.1 4 990 1.8
Final Dataset:
a b c 1 702 12.5 1 788 14.6 1 1148 10.1 2 417 16.4 2 441 16.4 2 458 16.4 2 465 16.9 2 495 16.9 2 502 16.9 2 707 22.5 2 766 22.5 2 863 15.5 3 1547 5.4 4 977 1.8 4 1547 1.8
Any suggestions as to how I can create something similar to the final dataset given the values in dataset 1 and dataset 2?
Thank you in advance
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
It does not matter...I would just like the closest value to dataset b because I'm trying to superimpose values in dataset a on a graph where dataset b is already displayed, but need the nearest/closest values of c so they are on top of each other. Not sure if this answers your question
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Unfortunately not. What happens if there's a tie for the smallest difference and one is above and one is below? Then which value do you take?
@Dregerator wrote:
It does not matter...I would just like the closest value to dataset b because I'm trying to superimpose values in dataset a on a graph where dataset b is already displayed, but need the nearest/closest values of c so they are on top of each other. Not sure if this answers your question
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Can we do one above for example purposes?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Data set A
a | b |
1 | 702 |
Dataset 2
a | b | c |
1 | 598 | 13.1 |
1 | 652 | 12.9 |
1 | 701 | 12.5 |
1 | 703 | 12.3 |
Now what should data set 3 look like?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
It will be 12.5
a | b | c |
1 | 702 | 12.5 |
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
This can get you started but doesn't handle the ties yet. Unfortunately I don't have time to resolve that, but I think you could figure it out or hopefully someone else can chime in. This only looks at absolute difference so it works for exactly the question asked but if you have ties you'll end up with duplicates in the output.
proc sql;
create table want as
select t1.a, t1.b, t2.c, abs(t1.b - t2.b) as diff
from data1 as t1
left join data2 as t2
on t1.a=t2.a
and not missing(t2.c)
group by t1.a, t1.b
having diff=min(diff)
;
quit;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Here is an alternative
data one;
input a b;
datalines;
1 702
1 788
1 1148
2 417
2 441
2 458
2 465
2 495
2 502
2 707
2 766
2 863
3 1547
4 977
4 1547
;
data two;
input a b c;
datalines;
1 598 13.1
1 652 12.9
1 701 12.5
1 766 12.3
1 807 14.6
1 1095 10.1
1 1142 .
1 1219 9.7
2 295 16.1
2 305 18.1
2 362 16.4
2 565 16.9
2 642 18.4
2 740 22.5
2 834 15.5
2 922 17.8
2 1007 27.6
2 1090 13.9
2 1286 14.7
2 1568 15.1
2 1561 11.9
3 903 10.5
3 1011 7.8
3 1099 8
3 1561 .
3 1656 5.4
4 862 1.7
4 906 1.1
4 990 1.8
;
data want(keep = a b c);
if _N_ = 1 then do;
dcl hash h(dataset : "two(rename = b = bb where = (c))", multidata : "Y");
h.definekey("a");
h.definedata("bb", "c");
h.definedone();
end;
set one;
bb = .; c = .; con = constant("big");
do while (h.do_over() = 0);
if abs(bb - b) < con then do;
con = abs(bb - b);
cc = c;
end;
end;
c = cc;
run;