BookmarkSubscribeRSS Feed
deleted_user
Not applicable
Hi all,

Is there a procedure/function in SAS that is equivalent to "Vlookup" or "Index....Match" function in Excel? Here is what I need to implement in SAS:

Dataset Policy has the following fields:
Key AttainedAge

Dataset Rates has the following fields:
Key Rates_20 Rates_21 Rates_22 ..... Rates_99

"Key" is the common variable in the above 2 datasets. I need to extract rates information from dataset "Rates" based on the attained age in the first dataset. For example, if the first record in "Policy" has attained age 34, then I need to pull Rates_34 from the second dataset and put it under dataset "Policy". Eventually, my data should look like this:

Key AttainedAge Rates

Is there any easy way to accomplish what I need to do? Your suggestions are greatly appreciated!

P.S. I have a huge set of data so I am not able to do this in the spreadsheet.

Thanks!
Katherine
12 REPLIES 12
CurtisMack
Fluorite | Level 6
I would do it with a merge and arrays.


data Policy;
Key = 1;
AttainedAge = 34;
output;
Key = 2;
AttainedAge = 40;
output;
run;

data Rates;
Key =1;
array RateArray (20:99) Rates_20 - Rates_99;
do i = 20 to 99;
RateArray(i) = i * 10000;
end;
output;
Key =2;
do i = 20 to 99;
RateArray(i) = i * 11111;
end;
output;
run;

*(Sort the datasets by Key if needed);

data mergedInfo(keep = Key AttainedAge Rate);
merge Policy Rates;
by Key;
array RateArray (20:99) Rates_20 - Rates_99;
Rate = RateArray(AttainedAge);
run;
chang_y_chung_hotmail_com
Obsidian | Level 7
It gets much easier if you have the rates in a long-shaped dataset. You can even do this temporarily by creating a view.



   /* test data */


   data rates;


     input key rates_20 rates_21 rates_22;


   cards;


   1 0.10 0.11 0.12


   2 0.20 0.21 0.22


   ;


   run;


 


   data policies;


     input key age;


   cards;


   1 21


   2 20


   ;


   run;


 


   /* "lookup" rates by key and age by merging */


 


   /* first we delete existing/create a new view of rates long-shaped */


   data ratesLong/view=ratesLong;


     set rates;


     array rates(20:22) rates_20 - rates_22;


     do age = lbound(rates) to hbound(rates);


       rate = rates(age);


       output;


     end;


     keep key age rate;


   run;


 


   /* merge by key and age */


   data policiesWithRate;


       merge policies(in=_main) ratesLong;


       by key age;


       if _main;


   run;


 


   /* check */


   proc print data=policiesWithRate;


   run;


   /* on lst


   Obs    key    age    rate


   


    1      1      21    0.11


    2      2      20    0.20


   */

deleted_user
Not applicable
I tried what you suggested and it worked! Thank you both for your help!
Ksharp
Super User
[pre]
/* test data */
data rates;
input key rates_20 rates_21 rates_22;
cards;
1 0.10 0.11 0.12
2 0.20 0.21 0.22
;
run;
data policies;
input key age;
cards;
1 21
2 20
;
run;

data hash;
set rates;
array rate{*} rates:;
do i=1 to dim(rate);
rate_v =rate{i};
age =input(scan(vname(rate{i}),-1,'_'),8.);
output;
end;
keep key rate_v age;
run;

data result;
declare hash find(hashexp: 10);
find.definekey('key','age');
find.definedata('rate_v');
find.definedone();

do until(last);
set hash end=last;
find.add();
end;
do until(_last);
set policies end=_last;
call missing(rate_v);
find.find();
output;
end;
run;


[/pre]



Ksharp
SuperGreenie
Calcite | Level 5

I am interested in setting up a similar hash, also to replace four vlookups, but I would like to get the lookup, or find, to repeat four times, for four different values. Here is what I need to do... I have a sas data set containing customer information, called custinfo, that contains four code fields (code1, code2, code3, code4) and I have a separate sas data set containing the code descriptions, with just two fields code and description. By the way, also, the code1-code4 fields can be blank, so I need to allow for that.  I was hoping I could use a sas hash to load the descriptions data into four new variables (code1desc, code2desc, code3desc, code4desc) in the custinfo data file. This has been done at the end of a reporting process, in Excel using a VLookup, once for each of the four codes. That works, but it is very inefficient.

I am very new to using the SAS hash and I can think of many ways to use this... once I figure it out! But, for now, I am intimidated by the code for the hash and I am not sure how to make the hash repeat four times for each code in the customer record.  I think that the find method is what I need to use, but I am not certain. I think I would use find instead of add because the key values are already in the custinfo file, as code1-4. I don't really understand the find syntax at all. This is what I have... I am stuck though, because I don't know how to tell SAS that there are key codes in more than just one field, and then how to put the description value into more than just one description field.

data newcustinfo;

     length code1desc code2desc code3desc code4desc $200;

     if _n_=1 then do;

     declare hash h();

     h.defineKey('code');

     h.defineData('descriptions');

     h.defineDone();

     end;

set custinfo;

*/this is where I start trying to follow examples in my sas textbook, and frankly have no idea from here;

rc1=h.find(key=code1);

if rc1=0 then _________;

rc2=h.find(key=code2);

if rc2=0 then _________;

rc3=h.find(key=code3);

if rc3=0 then _________;

rc4=h.find(key=code4);

if rc4=0 then _________;

run;

Any help will be appreciated! Thank you very much!

Reeza
Super User

For your situation I think a format would be better.

Look up how to create a format using cntlin and then apply it 4 time with put statements. Much easier to understand/maintain.

assuming you've created the format your data step will look like, assuming number codes.

data new;

set have;

desc1=put(code1, my_fmt.);

desc2=put(code2, my_fmt.);

desc3=put(code3, my_fmt.);

desc4=put(code4, my_fmt.);

run;

SuperGreenie
Calcite | Level 5

I thought that format was good for short lists and ranges, my problem is that there are about 18000 possible code descriptions.

Reeza
Super User

I don't think it matters how many codes you have.

See this paper:

http://www.lexjansen.com/wuss/2008/ess/ess04.pdf

Ksharp
Super User

That is easy for Hash Table. But you need to post some sample data and the output you need to explain your question more detail.

BTW, Reeza 's proposal is also good. Why not use proc format ? it adopts binary search algorithm which is also very fast.

Ksharp

SuperGreenie
Calcite | Level 5

Thank you, Ksharp and Reeza. This is a sample of the custinfo data:

custid,name,addr,city,st,zip,code1,code2,code3,code4

29348,Frands,101 Main,Centerville,AZ,85522,1,5,,

29074,Frenchs,108 South,Midvale,CA,92141,5,6,7,

23947,Alberts,52 Grant,Jefferson,MI,50852,4a,5,3,4a

92347,Ralphs,786 Hanley,Lewis,KS,60503,4a,2,2,1

32047,Kroger,567 Frost,Scottsdale,TX,45221,6,4a,7,6

29347,AP,678 Barton,Phoenix,AZ,84502,9,B8,4a,

94594,Aldi,901 Wilson,Mesa,AZ,86522,2,1,5,4a

93475,Meijer,345 12th,San Dimas,CA,92622,3,3,,

93479,Bass,354 Appletree,Bernard,WA,90255,1,6,6,

47349,Culvers,987 Peartree,Manzanita,NV,81511,1,5,7,

34978,Mr B,498 Figtree,Leonard,PA,20611,5,4a,,


This is an sample of the descriptions that go with the codes; some of the codes do contain characters:

code,description

1,Broadway

2,Lindell

3,Chouteau

4a,Peachtree

5,Del Rey

6,Grand

7,Pershing

B8,Adobe

9,Revillo

10,Menard

The end result I want is to create a dataset with the following fields:

custid,name,addr,city,st,zip,code1,code2,code3,code4,code1desc,code2desc,code3desc,code4desc

I am working my way through the lexjansen.com paper as Reeza suggested, particularly the cntlin statement and using formats to create new variables. I really appreciate your help. I will work on this over the weekend. 

shivas
Pyrite | Level 9

Hi,

Try this...Hope it helps

data lookup;

infile cards dlm=',';

input code $ description $ 30.;

cards;

1,Broadway

2,Lindell

3,Chouteau

4a,Peachtree

5,Del Rey

6,Grand

7,Pershing

B8,Adobe

9,Revillo

10,Menard

;

run;

data fmt (keep=FMTNAME START END LABEL TYPE);

length FMTNAME $30. START END $256.;

set WORK.lookup;

      FMTNAME = 'desc' ;

      START   = code;

      END     = START;

      LABEL   = description;

      TYPE    = 'C' ;

run;

proc format cntlin=fmt lib=work; run;

data want;

infile cards dsd dlm=',';

length name addr city st zip $ 20.;

input custid name $ addr $ city $st $ zip $ code1 $ code2 $ code3 $ code4 $;

codedesc1=put(code1,$desc.);

codedesc2=put(code2,$desc.);

codedesc3=put(code3,$desc.);

codedesc4=put(code4,$desc.);

cards;

29348,Frands,101 Main,Centerville,AZ,85522,1,5,,

29074,Frenchs,108 South,Midvale,CA,92141,5,6,7,

23947,Alberts,52 Grant,Jefferson,MI,50852,4a,5,3,4a

92347,Ralphs,786 Hanley,Lewis,KS,60503,4a,2,2,1

32047,Kroger,567 Frost,Scottsdale,TX,45221,6,4a,7,6

29347,AP,678 Barton,Phoenix,AZ,84502,9,B8,4a,

94594,Aldi,901 Wilson,Mesa,AZ,86522,2,1,5,4a

93475,Meijer,345 12th,San Dimas,CA,92622,3,3,,

93479,Bass,354 Appletree,Bernard,WA,90255,1,6,6,

47349,Culvers,987 Peartree,Manzanita,NV,81511,1,5,7,

34978,Mr B,498 Figtree,Leonard,PA,20611,5,4a,,

;

run;

Thanks,

Shiva

Ksharp
Super User

OK. Here is a way of Hash table.

data custinfo ;
infile datalines dsd truncover;
input (custid name addr city st zip code1 code2 code3 code4 ) (: $40.);
datalines;
29348,Frands,101 Main,Centerville,AZ,85522,1,5,,
29074,Frenchs,108 South,Midvale,CA,92141,5,6,7,
23947,Alberts,52 Grant,Jefferson,MI,50852,4a,5,3,4a
92347,Ralphs,786 Hanley,Lewis,KS,60503,4a,2,2,1
32047,Kroger,567 Frost,Scottsdale,TX,45221,6,4a,7,6
29347,AP,678 Barton,Phoenix,AZ,84502,9,B8,4a,
94594,Aldi,901 Wilson,Mesa,AZ,86522,2,1,5,4a
93475,Meijer,345 12th,San Dimas,CA,92622,3,3,,
93479,Bass,354 Appletree,Bernard,WA,90255,1,6,6,
47349,Culvers,987 Peartree,Manzanita,NV,81511,1,5,7,
34978,Mr B,498 Figtree,Leonard,PA,20611,5,4a,,
;
run;
data descriptions ;
infile datalines dsd truncover;
input (code description) ( : $40.) ;
datalines;
1,Broadway
2,Lindell
3,Chouteau
4a,Peachtree
5,Del Rey
6,Grand
7,Pershing
B8,Adobe
9,Revillo
10,Menard
;
run;

data want(drop=i code description);
 if _n_ eq 1 then do;
  if 0 then set descriptions;
  declare hash ha(hashexp:20,dataset:'descriptions');
   ha.definekey('code');
   ha.definedata('description');
   ha.definedone();
 end;
 set  custinfo;
 array _c{*} $ code1-code4;
 array _d{*} $ 50 code_desc1-code_desc4;
 do i=1 to dim(_c);
  if ha.find(key: _c{i}) eq 0 then _d{i}=description;
 end;
run;

Ksharp

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 12 replies
  • 21732 views
  • 1 like
  • 7 in conversation