Re: "VLookup" in SAS

deleted_user · Posted 02-01-2011 12:38 PM

Hi all,

Is there a procedure/function in SAS that is equivalent to "Vlookup" or "Index....Match" function in Excel? Here is what I need to implement in SAS:

Dataset Policy has the following fields:
Key AttainedAge

Dataset Rates has the following fields:
Key Rates_20 Rates_21 Rates_22 ..... Rates_99

"Key" is the common variable in the above 2 datasets. I need to extract rates information from dataset "Rates" based on the attained age in the first dataset. For example, if the first record in "Policy" has attained age 34, then I need to pull Rates_34 from the second dataset and put it under dataset "Policy". Eventually, my data should look like this:

Key AttainedAge Rates

Is there any easy way to accomplish what I need to do? Your suggestions are greatly appreciated!

P.S. I have a huge set of data so I am not able to do this in the spreadsheet.

Thanks!
Katherine

CurtisMack · Posted 02-01-2011 01:17 PM

I would do it with a merge and arrays.

data Policy;
Key = 1;
AttainedAge = 34;
output;
Key = 2;
AttainedAge = 40;
output;
run;

data Rates;
Key =1;
array RateArray (20:99) Rates_20 - Rates_99;
do i = 20 to 99;
RateArray(i) = i * 10000;
end;
output;
Key =2;
do i = 20 to 99;
RateArray(i) = i * 11111;
end;
output;
run;

*(Sort the datasets by Key if needed);

data mergedInfo(keep = Key AttainedAge Rate);
merge Policy Rates;
by Key;
array RateArray (20:99) Rates_20 - Rates_99;
Rate = RateArray(AttainedAge);
run;

chang_y_chung_hotmail_com · Posted 02-01-2011 02:14 PM

It gets much easier if you have the rates in a long-shaped dataset. You can even do this temporarily by creating a view.

/* test data */

data rates;

input key rates_20 rates_21 rates_22;

cards;

1 0.10 0.11 0.12

2 0.20 0.21 0.22

;

run;

data policies;

input key age;

cards;

1 21

2 20

;

run;

/* "lookup" rates by key and age by merging */

/* first we delete existing/create a new view of rates long-shaped */

data ratesLong/view=ratesLong;

set rates;

array rates(20:22) rates_20 - rates_22;

do age = lbound(rates) to hbound(rates);

rate = rates(age);

output;

end;

keep key age rate;

run;

/* merge by key and age */

data policiesWithRate;

merge policies(in=_main) ratesLong;

by key age;

if _main;

run;

/* check */

proc print data=policiesWithRate;

run;

/* on lst

Obs key age rate

1 1 21 0.11

2 2 20 0.20

*/

deleted_user · Posted 02-01-2011 02:55 PM

I tried what you suggested and it worked! Thank you both for your help!

Ksharp · Posted 02-22-2011 04:42 AM

[pre]
/* test data */
data rates;
input key rates_20 rates_21 rates_22;
cards;
1 0.10 0.11 0.12
2 0.20 0.21 0.22
;
run;
data policies;
input key age;
cards;
1 21
2 20
;
run;

data hash;
set rates;
array rate{*} rates:;
do i=1 to dim(rate);
rate_v =rate{i};
age =input(scan(vname(rate{i}),-1,'_'),8.);
output;
end;
keep key rate_v age;
run;

data result;
declare hash find(hashexp: 10);
find.definekey('key','age');
find.definedata('rate_v');
find.definedone();

do until(last);
set hash end=last;
find.add();
end;
do until(_last);
set policies end=_last;
call missing(rate_v);
find.find();
output;
end;
run;

[/pre]

Ksharp

SuperGreenie · Posted 09-06-2012 08:48 PM

I am interested in setting up a similar hash, also to replace four vlookups, but I would like to get the lookup, or find, to repeat four times, for four different values. Here is what I need to do... I have a sas data set containing customer information, called custinfo, that contains four code fields (code1, code2, code3, code4) and I have a separate sas data set containing the code descriptions, with just two fields code and description. By the way, also, the code1-code4 fields can be blank, so I need to allow for that. I was hoping I could use a sas hash to load the descriptions data into four new variables (code1desc, code2desc, code3desc, code4desc) in the custinfo data file. This has been done at the end of a reporting process, in Excel using a VLookup, once for each of the four codes. That works, but it is very inefficient.

I am very new to using the SAS hash and I can think of many ways to use this... once I figure it out! But, for now, I am intimidated by the code for the hash and I am not sure how to make the hash repeat four times for each code in the customer record. I think that the find method is what I need to use, but I am not certain. I think I would use find instead of add because the key values are already in the custinfo file, as code1-4. I don't really understand the find syntax at all. This is what I have... I am stuck though, because I don't know how to tell SAS that there are key codes in more than just one field, and then how to put the description value into more than just one description field.

data newcustinfo;

length code1desc code2desc code3desc code4desc $200;

if _n_=1 then do;

declare hash h();

h.defineKey('code');

h.defineData('descriptions');

h.defineDone();

end;

set custinfo;

*/this is where I start trying to follow examples in my sas textbook, and frankly have no idea from here;

rc1=h.find(key=code1);

if rc1=0 then _________;

rc2=h.find(key=code2);

if rc2=0 then _________;

rc3=h.find(key=code3);

if rc3=0 then _________;

rc4=h.find(key=code4);

if rc4=0 then _________;

run;

Any help will be appreciated! Thank you very much!

Reeza · Posted 09-06-2012 09:08 PM

For your situation I think a format would be better.

Look up how to create a format using cntlin and then apply it 4 time with put statements. Much easier to understand/maintain.

assuming you've created the format your data step will look like, assuming number codes.

data new;

set have;

desc1=put(code1, my_fmt.);

desc2=put(code2, my_fmt.);

desc3=put(code3, my_fmt.);

desc4=put(code4, my_fmt.);

run;

SuperGreenie · Posted 09-06-2012 09:11 PM

I thought that format was good for short lists and ranges, my problem is that there are about 18000 possible code descriptions.

Reeza · Posted 09-06-2012 11:03 PM

I don't think it matters how many codes you have.

See this paper:

http://www.lexjansen.com/wuss/2008/ess/ess04.pdf

Ksharp · Posted 09-06-2012 11:10 PM

That is easy for Hash Table. But you need to post some sample data and the output you need to explain your question more detail.

BTW, Reeza 's proposal is also good. Why not use proc format ? it adopts binary search algorithm which is also very fast.

Ksharp

SuperGreenie · Posted 09-07-2012 11:14 AM

Thank you, Ksharp and Reeza. This is a sample of the custinfo data:

custid,name,addr,city,st,zip,code1,code2,code3,code4

29348,Frands,101 Main,Centerville,AZ,85522,1,5,,

29074,Frenchs,108 South,Midvale,CA,92141,5,6,7,

23947,Alberts,52 Grant,Jefferson,MI,50852,4a,5,3,4a

92347,Ralphs,786 Hanley,Lewis,KS,60503,4a,2,2,1

32047,Kroger,567 Frost,Scottsdale,TX,45221,6,4a,7,6

29347,AP,678 Barton,Phoenix,AZ,84502,9,B8,4a,

94594,Aldi,901 Wilson,Mesa,AZ,86522,2,1,5,4a

93475,Meijer,345 12th,San Dimas,CA,92622,3,3,,

93479,Bass,354 Appletree,Bernard,WA,90255,1,6,6,

47349,Culvers,987 Peartree,Manzanita,NV,81511,1,5,7,

34978,Mr B,498 Figtree,Leonard,PA,20611,5,4a,,

This is an sample of the descriptions that go with the codes; some of the codes do contain characters:

code,description

1,Broadway

2,Lindell

3,Chouteau

4a,Peachtree

5,Del Rey

6,Grand

7,Pershing

B8,Adobe

9,Revillo

10,Menard

The end result I want is to create a dataset with the following fields:

custid,name,addr,city,st,zip,code1,code2,code3,code4,code1desc,code2desc,code3desc,code4desc

I am working my way through the lexjansen.com paper as Reeza suggested, particularly the cntlin statement and using formats to create new variables. I really appreciate your help. I will work on this over the weekend.

shivas · Posted 09-07-2012 11:35 AM

Hi,

Try this...Hope it helps

data lookup;

infile cards dlm=',';

input code $ description $ 30.;

cards;

1,Broadway

2,Lindell

3,Chouteau

4a,Peachtree

5,Del Rey

6,Grand

7,Pershing

B8,Adobe

9,Revillo

10,Menard

;

run;

data fmt (keep=FMTNAME START END LABEL TYPE);

length FMTNAME $30. START END $256.;

set WORK.lookup;

FMTNAME = 'desc' ;

START = code;

END = START;

LABEL = description;

TYPE = 'C' ;

run;

proc format cntlin=fmt lib=work; run;

data want;

infile cards dsd dlm=',';

length name addr city st zip $ 20.;

input custid name $ addr $ city $st $ zip $ code1 $ code2 $ code3 $ code4 $;

codedesc1=put(code1,$desc.);

codedesc2=put(code2,$desc.);

codedesc3=put(code3,$desc.);

codedesc4=put(code4,$desc.);

cards;

29348,Frands,101 Main,Centerville,AZ,85522,1,5,,

29074,Frenchs,108 South,Midvale,CA,92141,5,6,7,

23947,Alberts,52 Grant,Jefferson,MI,50852,4a,5,3,4a

92347,Ralphs,786 Hanley,Lewis,KS,60503,4a,2,2,1

32047,Kroger,567 Frost,Scottsdale,TX,45221,6,4a,7,6

29347,AP,678 Barton,Phoenix,AZ,84502,9,B8,4a,

94594,Aldi,901 Wilson,Mesa,AZ,86522,2,1,5,4a

93475,Meijer,345 12th,San Dimas,CA,92622,3,3,,

93479,Bass,354 Appletree,Bernard,WA,90255,1,6,6,

47349,Culvers,987 Peartree,Manzanita,NV,81511,1,5,7,

34978,Mr B,498 Figtree,Leonard,PA,20611,5,4a,,

;

run;

Thanks,

Shiva

Ksharp · Posted 09-10-2012 12:56 AM

OK. Here is a way of Hash table.

data custinfo ;
infile datalines dsd truncover;
input (custid name addr city st zip code1 code2 code3 code4 ) (: $40.);
datalines;
29348,Frands,101 Main,Centerville,AZ,85522,1,5,,
29074,Frenchs,108 South,Midvale,CA,92141,5,6,7,
23947,Alberts,52 Grant,Jefferson,MI,50852,4a,5,3,4a
92347,Ralphs,786 Hanley,Lewis,KS,60503,4a,2,2,1
32047,Kroger,567 Frost,Scottsdale,TX,45221,6,4a,7,6
29347,AP,678 Barton,Phoenix,AZ,84502,9,B8,4a,
94594,Aldi,901 Wilson,Mesa,AZ,86522,2,1,5,4a
93475,Meijer,345 12th,San Dimas,CA,92622,3,3,,
93479,Bass,354 Appletree,Bernard,WA,90255,1,6,6,
47349,Culvers,987 Peartree,Manzanita,NV,81511,1,5,7,
34978,Mr B,498 Figtree,Leonard,PA,20611,5,4a,,
;
run;
data descriptions ;
infile datalines dsd truncover;
input (code description) ( : $40.) ;
datalines;
1,Broadway
2,Lindell
3,Chouteau
4a,Peachtree
5,Del Rey
6,Grand
7,Pershing
B8,Adobe
9,Revillo
10,Menard
;
run;

data want(drop=i code description);
 if _n_ eq 1 then do;
  if 0 then set descriptions;
  declare hash ha(hashexp:20,dataset:'descriptions');
   ha.definekey('code');
   ha.definedata('description');
   ha.definedone();
 end;
 set  custinfo;
 array _c{*} $ code1-code4;
 array _d{*} $ 50 code_desc1-code_desc4;
 do i=1 to dim(_c);
  if ha.find(key: _c{i}) eq 0 then _d{i}=description;
 end;
run;

Ksharp

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away