BookmarkSubscribeRSS Feed
dustychair
Pyrite | Level 9

Hi all,

Thanks to @ballardw  and @PeterClemmensen for helpful vlookup sql code that you guys provided in one of my former questions. 

I have a new question. Let me explain my data first.

Data 1

raw_score_s scale_score_s tier domain grade

0 118 A Speaking 1

1 118 A Speaking 1

2 118 A Speaking 1

3 118 A Speaking 1

 

Data2

raw_score_a scale_score_a tier domain grade

2 118 A Speaking 1

1 118 A Speaking 1

0 118 A Speaking 1

3 118 A Speaking 1

 

want

raw_score_s scale_score_s tier domain grade raw_score_a

0 118 A Speaking 1 0

1 118 A Speaking 1 1

2 118 A Speaking 1 2

3 118 A Speaking 1 3

 

I want to use vlookup function to pull data2's raw_score_a variable. However, since there are variables that have exactly the same values, the function doesn't match them correctly. I tried to assign ids to both data sets and then match them based on grade, tier, domain, scale score, and id but it is not as easy as I thought. Or any other method?

 Could you help me with that?

Than you,

 

5 REPLIES 5
Satish_Parida
Lapis Lazuli | Level 10

The Question is not clearly defining the problem.

 

The equivalent of vlookup in SAS is a format or a hash map.

 

I have created an example, see if it helps. Please edit the question to add more detail.

 

data data1;
input raw_score_s scale_score_s tier $ domain $ grade;
cards;
0 118 A Speaking 1
1 118 A Speaking 1
2 118 A Speaking 1
3 118 A Speaking 1
;
run;
 

data Data2;
input raw_score_a scale_score_a tier $ domain $ grade;
cards;
2 118 A Speaking 1
1 118 A Speaking 1
0 118 A Speaking 1
3 118 A Speaking 1
;
run;

data Data2 (rename=(scale_score_a=scale_score_s));
set Data2;
run;

data want;
length raw_score_a 8.;
set Data1;
if _n_=1 then do;
	DECLARE HASH rc(dataset:'Data2'); 
	rc.DEFINEKEY ('scale_score_s', 'tier', 'domain', 'grade'); 
	rc.DEFINEDATA('raw_score_a');
	rc.DEFINEDONE();
end;
if rc.find() ne 0 then raw_score_a=.;
run;
dustychair
Pyrite | Level 9

Thanks for your response @Satish_Parida.

I'm still getting differences between raw_score_a and raw_score_s. When I calculate the difference between these two variables there are some differences (should be 0 for the exact matching). Also, my data set has 4004 rows, with this code I got 13376 rows. I think it is populating rows.

 

Thanks

plevcek
Fluorite | Level 6

Maybe I am not understanding the problem but it seems that what you are describing and what you want is not matching.

 

What is your matching criteria for the tables? scale_score, tier, domain, grade?

If that is the case then you will get more rows than original 4004 otherwise it seems like you are matching on all columns including raw_score.

 

Here are 2 options because I'm not sure which one are you after:

PROC SQL; CREATE TABLE want AS 
SELECT a.*, b.raw_score_a 
FROM data1 a LEFT JOIN data2 b 
ON a.scale_score_s = b.scale_score_a 
AND a.tier = b.tier 
AND a.domain = b.domain 
AND a.grade = b.grade
AND a.raw_score_s = b.raw_score_a;
PROC SQL; CREATE TABLE want AS 
SELECT a.*, b.raw_score_a 
FROM data1 a LEFT JOIN data2 b 
ON a.scale_score_s = b.scale_score_a 
AND a.tier = b.tier 
AND a.domain = b.domain 
AND a.grade = b.grade;

Also do you want specifically DATA step or is SQL fine? If data step is needed then Hash tables are the way to go like you have it in previous post.

dustychair
Pyrite | Level 9

Thank you so much @plevcek.

SQL is fine and it worked. But I didn't understand why I'm populating rows. I think I'm thinking of the idea in the excel. In excel, you just pull the variable values that you want, you don't populate any other values. Also, I didn't get the idea that using the raw_score_s and/or raw_score_a variables to match raw_score_a and raw_score_s (I'm referring to the first SQL that you provided).

 

Thanks,

Tom
Super User Tom
Super User

Please explain the logic you are using in more detail. 

 

It sounds like you are starting with dataset DATA1.  Then trying to match it to some row in DATA2 (using what criteria?) and pull RAW_SCORE_A from that row in DATA2.

 

But for the first row it looks like you decided to use the third row of DATA2 to get zero for RAW_SCORE_A.  Why did you match it to the third row?  Why not the one of the other rows?  If you are just matching on RAW_SCORE_C then why use the lookup idea at all and instead just copy the variable?

 

So in addition to explaining your logic you need to post better example data that more completely shows what you are doing.

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 3740 views
  • 0 likes
  • 4 in conversation