Hi All,
I have two data sets as below. I'd like to match/find equivalent of a score in data set1 from data set2. I know I need vlookup function but I need to do this for per domain. So, basically I need to use vlookup for matched data sets based on domains.
Data set1 Data set2 data_want
paper_s domain paper_s EQ_paper domain paper_s EQ_paper domain
101 Speaking 101 151 Speaking 101 151 Speaking
102 Listening 102 180 Reading 102 181 Listening
103 Reading 102 181 Listening 103 180 Reading
104 160 Listening
103 180 Reading
.... ....
I'll appreciate for any help, any comments!
Thanks
@dustychair wrote:
Hi @PeterClemmensen,
Thank you so much for your response. I didn't get any error with your code but it doesn't match them correctly.
Here is the code includes the original data set names and variable names. You may be confused with same variable names which I provided in my question.
proc sql;
create table all_domain_paper_eq1 as
select all_domain_paper.*,
EQ_paper_scale_score
from all_domain_paper left join access_p_eq
on all_domain_paper.scale_score=access_p_eq.paper_scale_score
group by access_p_eq.paper_scale_score
having EQ_paper_scale_score=max(EQ_paper_scale_score);
quit;
Are you referring to use of data other than shown above? Then provide examples such that the code @PeterClemmensen provided does not "match correctly" and provide an example of the correct result for the new provided example inputs.
You likely need to provide an explicit rule for which value of EQ_Paper is selected. Perhaps the example you provided only looked like you wanted the largest value of EQ_PAPER but something else is needed. (that is where the =max() came from, apparent want of largest score)
Perhaps:
proc sql; create table want as select data1.*, EQ_paper from data1 left join data2 on data1.paper_s=data2.paper_s and data1.domain=data2.domain; quit;
However if you have multiple domain scores for a given paper_s value then you need to provide a rule for which score is wanted.
data data1;
input paper_s domain :$20.;
datalines;
101 Speaking
102 Listening
103 Reading
;
data data2;
input paper_s EQ_paper domain :$20.;
datalines;
101 151 Speaking
102 180 Reading
102 181 Listening
104 160 Listening
103 180 Reading
;
proc sql;
create table want as
select data1.*,
EQ_paper
from data1 left join data2
on data1.paper_s=data2.paper_s
group by data2.paper_s
having EQ_paper=max(EQ_paper);
quit;
Hi @PeterClemmensen,
Thank you so much for your response. I didn't get any error with your code but it doesn't match them correctly.
Here is the code includes the original data set names and variable names. You may be confused with same variable names which I provided in my question.
proc sql;
create table all_domain_paper_eq1 as
select all_domain_paper.*,
EQ_paper_scale_score
from all_domain_paper left join access_p_eq
on all_domain_paper.scale_score=access_p_eq.paper_scale_score
group by access_p_eq.paper_scale_score
having EQ_paper_scale_score=max(EQ_paper_scale_score);
quit;
@dustychair wrote:
Hi @PeterClemmensen,
Thank you so much for your response. I didn't get any error with your code but it doesn't match them correctly.
Here is the code includes the original data set names and variable names. You may be confused with same variable names which I provided in my question.
proc sql;
create table all_domain_paper_eq1 as
select all_domain_paper.*,
EQ_paper_scale_score
from all_domain_paper left join access_p_eq
on all_domain_paper.scale_score=access_p_eq.paper_scale_score
group by access_p_eq.paper_scale_score
having EQ_paper_scale_score=max(EQ_paper_scale_score);
quit;
Are you referring to use of data other than shown above? Then provide examples such that the code @PeterClemmensen provided does not "match correctly" and provide an example of the correct result for the new provided example inputs.
You likely need to provide an explicit rule for which value of EQ_Paper is selected. Perhaps the example you provided only looked like you wanted the largest value of EQ_PAPER but something else is needed. (that is where the =max() came from, apparent want of largest score)
Perhaps:
proc sql; create table want as select data1.*, EQ_paper from data1 left join data2 on data1.paper_s=data2.paper_s and data1.domain=data2.domain; quit;
However if you have multiple domain scores for a given paper_s value then you need to provide a rule for which score is wanted.
Hi @ballardw and @PeterClemmensen ,
Let me rewrite everything. You guys are right. It may be my fault since there is another variable to be matched.
Let me clarify it.
data1= all_domain_paper
domain scale_score grade
listening 100 1
listening 121 1
reading 121 1
reading 121 2
data2= access_p_eq
paper_scale_score EQ_paper_scale_score domain grade
100 104 listening 1
121 129 listening 1
121 227 reading 1
121 380 reading 2
So, as you see that the scale_score in data1 should match with paper_scale_score in the data2 based on domain and grade. Then I need it's equivalent in EQ_paper_scale_score. So matching criteria are two which are domain and grade.
Thanks for your help
There we go 🙂 And from data1 and data2, what does your desired data set look like?
Hi @PeterClemmensen,
The data I want has the same idea as before. I want the equivalent of scale_score in data1 from data2. But once again they have to be matched based on the domain and grade. If you noticed that there are two 121 reading scale scores in the data1, however their equivalent in data2 are different since they have different grades.
Thank you
data_want=data_want
domain scale_score grade EQ_paper_scale_score
listening 100 1 104
listening 121 1 129
reading 121 1 227
reading 121 2 380
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.