/* test data */
data rates;
input key rates_20 rates_21 rates_22;
cards;
1 0.10 0.11 0.12
2 0.20 0.21 0.22
;
run;
data policies;
input key age;
cards;
1 21
2 20
;
run;
/* "lookup" rates by key and age by merging */
/* first we delete existing/create a new view of rates long-shaped */
data ratesLong/view=ratesLong;
set rates;
array rates(20:22) rates_20 - rates_22;
do age = lbound(rates) to hbound(rates);
rate = rates(age);
output;
end;
keep key age rate;
run;
/* merge by key and age */
data policiesWithRate;
merge policies(in=_main) ratesLong;
by key age;
if _main;
run;
/* check */
proc print data=policiesWithRate;
run;
/* on lst
Obs key age rate
1 1 21 0.11
2 2 20 0.20
*/
I am interested in setting up a similar hash, also to replace four vlookups, but I would like to get the lookup, or find, to repeat four times, for four different values. Here is what I need to do... I have a sas data set containing customer information, called custinfo, that contains four code fields (code1, code2, code3, code4) and I have a separate sas data set containing the code descriptions, with just two fields code and description. By the way, also, the code1-code4 fields can be blank, so I need to allow for that. I was hoping I could use a sas hash to load the descriptions data into four new variables (code1desc, code2desc, code3desc, code4desc) in the custinfo data file. This has been done at the end of a reporting process, in Excel using a VLookup, once for each of the four codes. That works, but it is very inefficient.
I am very new to using the SAS hash and I can think of many ways to use this... once I figure it out! But, for now, I am intimidated by the code for the hash and I am not sure how to make the hash repeat four times for each code in the customer record. I think that the find method is what I need to use, but I am not certain. I think I would use find instead of add because the key values are already in the custinfo file, as code1-4. I don't really understand the find syntax at all. This is what I have... I am stuck though, because I don't know how to tell SAS that there are key codes in more than just one field, and then how to put the description value into more than just one description field.
data newcustinfo;
length code1desc code2desc code3desc code4desc $200;
if _n_=1 then do;
declare hash h();
h.defineKey('code');
h.defineData('descriptions');
h.defineDone();
end;
set custinfo;
*/this is where I start trying to follow examples in my sas textbook, and frankly have no idea from here;
rc1=h.find(key=code1);
if rc1=0 then _________;
rc2=h.find(key=code2);
if rc2=0 then _________;
rc3=h.find(key=code3);
if rc3=0 then _________;
rc4=h.find(key=code4);
if rc4=0 then _________;
run;
Any help will be appreciated! Thank you very much!
For your situation I think a format would be better.
Look up how to create a format using cntlin and then apply it 4 time with put statements. Much easier to understand/maintain.
assuming you've created the format your data step will look like, assuming number codes.
data new;
set have;
desc1=put(code1, my_fmt.);
desc2=put(code2, my_fmt.);
desc3=put(code3, my_fmt.);
desc4=put(code4, my_fmt.);
run;
I thought that format was good for short lists and ranges, my problem is that there are about 18000 possible code descriptions.
I don't think it matters how many codes you have.
See this paper:
That is easy for Hash Table. But you need to post some sample data and the output you need to explain your question more detail.
BTW, Reeza 's proposal is also good. Why not use proc format ? it adopts binary search algorithm which is also very fast.
Ksharp
Thank you, Ksharp and Reeza. This is a sample of the custinfo data:
custid,name,addr,city,st,zip,code1,code2,code3,code4
29348,Frands,101 Main,Centerville,AZ,85522,1,5,,
29074,Frenchs,108 South,Midvale,CA,92141,5,6,7,
23947,Alberts,52 Grant,Jefferson,MI,50852,4a,5,3,4a
92347,Ralphs,786 Hanley,Lewis,KS,60503,4a,2,2,1
32047,Kroger,567 Frost,Scottsdale,TX,45221,6,4a,7,6
29347,AP,678 Barton,Phoenix,AZ,84502,9,B8,4a,
94594,Aldi,901 Wilson,Mesa,AZ,86522,2,1,5,4a
93475,Meijer,345 12th,San Dimas,CA,92622,3,3,,
93479,Bass,354 Appletree,Bernard,WA,90255,1,6,6,
47349,Culvers,987 Peartree,Manzanita,NV,81511,1,5,7,
34978,Mr B,498 Figtree,Leonard,PA,20611,5,4a,,
This is an sample of the descriptions that go with the codes; some of the codes do contain characters:
code,description
1,Broadway
2,Lindell
3,Chouteau
4a,Peachtree
5,Del Rey
6,Grand
7,Pershing
B8,Adobe
9,Revillo
10,Menard
The end result I want is to create a dataset with the following fields:
custid,name,addr,city,st,zip,code1,code2,code3,code4,code1desc,code2desc,code3desc,code4desc
I am working my way through the lexjansen.com paper as Reeza suggested, particularly the cntlin statement and using formats to create new variables. I really appreciate your help. I will work on this over the weekend.
Hi,
Try this...Hope it helps
data lookup;
infile cards dlm=',';
input code $ description $ 30.;
cards;
1,Broadway
2,Lindell
3,Chouteau
4a,Peachtree
5,Del Rey
6,Grand
7,Pershing
B8,Adobe
9,Revillo
10,Menard
;
run;
data fmt (keep=FMTNAME START END LABEL TYPE);
length FMTNAME $30. START END $256.;
set WORK.lookup;
FMTNAME = 'desc' ;
START = code;
END = START;
LABEL = description;
TYPE = 'C' ;
run;
proc format cntlin=fmt lib=work; run;
data want;
infile cards dsd dlm=',';
length name addr city st zip $ 20.;
input custid name $ addr $ city $st $ zip $ code1 $ code2 $ code3 $ code4 $;
codedesc1=put(code1,$desc.);
codedesc2=put(code2,$desc.);
codedesc3=put(code3,$desc.);
codedesc4=put(code4,$desc.);
cards;
29348,Frands,101 Main,Centerville,AZ,85522,1,5,,
29074,Frenchs,108 South,Midvale,CA,92141,5,6,7,
23947,Alberts,52 Grant,Jefferson,MI,50852,4a,5,3,4a
92347,Ralphs,786 Hanley,Lewis,KS,60503,4a,2,2,1
32047,Kroger,567 Frost,Scottsdale,TX,45221,6,4a,7,6
29347,AP,678 Barton,Phoenix,AZ,84502,9,B8,4a,
94594,Aldi,901 Wilson,Mesa,AZ,86522,2,1,5,4a
93475,Meijer,345 12th,San Dimas,CA,92622,3,3,,
93479,Bass,354 Appletree,Bernard,WA,90255,1,6,6,
47349,Culvers,987 Peartree,Manzanita,NV,81511,1,5,7,
34978,Mr B,498 Figtree,Leonard,PA,20611,5,4a,,
;
run;
Thanks,
Shiva
OK. Here is a way of Hash table.
data custinfo ; infile datalines dsd truncover; input (custid name addr city st zip code1 code2 code3 code4 ) (: $40.); datalines; 29348,Frands,101 Main,Centerville,AZ,85522,1,5,, 29074,Frenchs,108 South,Midvale,CA,92141,5,6,7, 23947,Alberts,52 Grant,Jefferson,MI,50852,4a,5,3,4a 92347,Ralphs,786 Hanley,Lewis,KS,60503,4a,2,2,1 32047,Kroger,567 Frost,Scottsdale,TX,45221,6,4a,7,6 29347,AP,678 Barton,Phoenix,AZ,84502,9,B8,4a, 94594,Aldi,901 Wilson,Mesa,AZ,86522,2,1,5,4a 93475,Meijer,345 12th,San Dimas,CA,92622,3,3,, 93479,Bass,354 Appletree,Bernard,WA,90255,1,6,6, 47349,Culvers,987 Peartree,Manzanita,NV,81511,1,5,7, 34978,Mr B,498 Figtree,Leonard,PA,20611,5,4a,, ; run; data descriptions ; infile datalines dsd truncover; input (code description) ( : $40.) ; datalines; 1,Broadway 2,Lindell 3,Chouteau 4a,Peachtree 5,Del Rey 6,Grand 7,Pershing B8,Adobe 9,Revillo 10,Menard ; run; data want(drop=i code description); if _n_ eq 1 then do; if 0 then set descriptions; declare hash ha(hashexp:20,dataset:'descriptions'); ha.definekey('code'); ha.definedata('description'); ha.definedone(); end; set custinfo; array _c{*} $ code1-code4; array _d{*} $ 50 code_desc1-code_desc4; do i=1 to dim(_c); if ha.find(key: _c{i}) eq 0 then _d{i}=description; end; run;
Ksharp
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.