I have two datasets A and B with a common column severity. I need to identify which row of B has the severity closest to the severity of the each row of A. It is called the nearest neighbor problem.
I am reading in both A and B and then for each row of A search for the nearest neighbor in B using a do loop. However, I am not sure how to create index for B so that a do loop can be done.
Please NOTE : There are no common variables and the subjects are in different studies. The purpose of the analysis is to figure out who in B is the closest subject to each one in A. .
In terms of disease severity.
The final output should be below:-- please HELP AND URGENT :-
Severity_A | subject_A | Severity_B | subject_B |
1.2 | 1 | 1.5 | 6 |
1.2 | 1 | 2.6 | 7 |
1.2 | 1 | 4.5 | 8 |
1.2 | 1 | 5.1 | 9 |
1.2 | 1 | 5.7 | 10 |
2.3 | 2 | 1.5 | 6 |
2.3 | 2 | 2.6 | 7 |
2.3 | 2 | 4.5 | 8 |
2.3 | 2 | 5.1 | 9 |
2.3 | 2 | 5.7 | 10 |
… | … |
|
|
Any help will be greatly appreciated. Thank you very much!
Without knowing your expected output we can only have our best guess. Try this.
To get to the final results uncomment the commented code.
Data A;
input Severity subject;
cards;
1.2 1
2.3 2
3.4 3
5.3 4
6.7 5
;
run;
Data B;
input Severity subject;
cards;
1.5 6
2.6 7
4.5 8
5.1 9
5.7 10
;
run;
proc sql;
create table c as
select a.Severity as A_Severity
, a.Subject as A_Subject
, b.Severity as B_Severity
, b.Subject as B_Subject
, abs(A.Severity - B.Severity) as Distance
from work.A as A
, work.B as B
/* group by a.Subject*/
/* having calculated Distance = min(Calculated Distance)*/
order by A_Subject, Calculated Distance
;
quit;
Actual data to go with your desired results would be helpful.
Do you need unique nearest neighbors? Meaning that if in Set A we have severity values of 1.3 and 1.5, and in Set B we have a 1.4 and a 2.7. Can the 1.3 and the 1.5 both be matched to the 1.4 or do we need to use unique values?
They can both be matched since we want to know whom in B are similar to those in A.
can you help me ?
Without knowing your expected output we can only have our best guess. Try this.
To get to the final results uncomment the commented code.
Data A;
input Severity subject;
cards;
1.2 1
2.3 2
3.4 3
5.3 4
6.7 5
;
run;
Data B;
input Severity subject;
cards;
1.5 6
2.6 7
4.5 8
5.1 9
5.7 10
;
run;
proc sql;
create table c as
select a.Severity as A_Severity
, a.Subject as A_Subject
, b.Severity as B_Severity
, b.Subject as B_Subject
, abs(A.Severity - B.Severity) as Distance
from work.A as A
, work.B as B
/* group by a.Subject*/
/* having calculated Distance = min(Calculated Distance)*/
order by A_Subject, Calculated Distance
;
quit;
Thanks a lot , I will let you know if this is matching with requirement. But really appreciate your response and help
the answer is perfect, Thanks
Please clarify the apparent inconsistency in your text:
@GeetVish wrote:
I have two datasets A and B with a common column severity. I need to identify which row of B has the severity closest to the severity of the each row of A. It is called the nearest neighbor problem.
I am reading in both A and B and then for each row of A search for the nearest neighbor in B using a do loop. However, I am not sure how to create index for B so that a do loop can be done.
Please NOTE : There are no common variables and the subjects are in different studies. The purpose of the analysis is to figure out who in B is the closest subject to each one in A. .In terms of disease severity.
Please provide examples of appropriate variables from BOTH input data sets, best as data step code to ensure we have the correct names and types. Then provide example output expected from that example input. Your example output as shown also does not show a "nearest neighbor" result but a "five nearest neighbors" result. If the desire is 5 nearest then please say so as that is a significantly different problem.
Since Proc Sql does not support do loops I am having a hard time imagining what you may be contemplating.
Dataset A:
Severity subject
1.2 1
2.3 2
3.4 3
5.3 4
6.7 5
Dataset B:
Severity subject
1.5 6
2.6 7
4.5 8
5.1 9
5.7 10
can you help me ?
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.