turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- SAS Programming
- /
- General Programming
- /
- Requesting efficent look up solution for a large d...

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

01-20-2015 11:57 AM

Hi, Requesting your help for an efficient look up code either hashes, format or whatever for the following problem.

I have two Data sets with the following variables and values:

Dataset1:

ID | Rating1 | Rating2 | Rating3 |

900129 | |||

2000425 | |||

900135 | Baa1 | A- | |

2000328 | |||

900152 | Baa1 | NR | A |

900265 | Ba1 | BBB- | |

900005 | Baa1 | A+ | |

900108 | Baa2 | BBB+ | A |

900121 | A1 | A | A+ |

900008 | A2 | A+ | |

6964444 | Baa3 | BB+ | |

900149 | BBB+ | ||

900124 | |||

000100-502 | |||

900153 | A2 | A+ | |

2000315 | |||

900136 | Baa1 | BBB+ | |

900122 | Aa3 | AA- | AA- |

900155 | A3 | BBB+ | BBB+ |

1103127 | |||

900185 | A1 | A+ | |

900151 | Baa1 | ||

900172 | A2 | A- | A |

900007 | A- | A | |

99900096 | Aa3 | A+ | A+ |

900216 | A2 | NR | A+ |

99900169 | Baa1 | BBB+ | A+ |

900016 | A1 | A- | A+ |

900385 | Aa3 | A+ | AA- |

Dataset2:

Rank | Rating_LKUP | Q-RATING | R_RATING | S_RATING | R% |

23 | 1 | Aaa | AAA | AAA | 20% |

22 | 2+ | Aa1 | AA+ | AA+ | 20% |

21 | 2 | Aa2 | AA | AA | 20% |

20 | 2- | Aa3 | AA- | AA- | 20% |

19 | 3+ | A1 | A+ | A+ | 50% |

18 | 3 | A2 | A | A | 50% |

17 | 3- | A3 | A- | A- | 50% |

16 | 4+ | Baa1 | BBB+ | BBB+ | 100% |

15 | 4 | Baa2 | BBB | BBB | 100% |

14 | 4- | Baa3 | BBB- | BBB- | 100% |

13 | 5+ | Ba1 | BB+ | BB+ | 100% |

12 | 5 | Ba2 | BB | BB | 100% |

11 | 5- | Ba3 | BB- | BB- | 100% |

10 | 6+ | B1 | B+ | B+ | 150% |

9 | 6 | B2 | B | B | 150% |

8 | 6- | B3 | B- | B- | 150% |

7 | 7+ | Caa1 | CCC+ | CCC+ | 150% |

6 | 7 | Caa2 | CCC | CCC | 150% |

5 | 7- | Caa3 | CCC- | CCC- | 150% |

4 | 8 | ||||

3 | 9 | ||||

2 | 10 | ||||

1 | UR | Unrated | Unrated | Unrated |

My objective is I need a solution to look up dataset1 to dataset 2 to populate the correct R% from dataset2 in dataset1 for each ID based on the following logic:-

1. For each ID there can either one or 3 ratings or none in dataset1. If there is one or more rating in dataset1 the corresponding look up columns in dataset2 are Q-Rating, R-rating and S-rating as you can see above.

2. For example in row3 of dataset, you would notice for ID:900135 there are 2 ratings Baa1 and A- in dataset1. The corresponding look up value i.e A- and Baa1 can be found in 7th and 8th row of dataset2 respectively.

3. So yeah, if there are 2 ratings in dataset1, there would be 2 records with corresponding look up values in dataset2 and likewise if 3 there are 3 ratings, there would be 3 records of corresponding look up values, and at max there can be only 3 in dataset2.

4. I need to pick the R% for each ID in dataset1 using the lowest rating or in other words with an example for ID:900135 i need to pick the R% from dataset2 using the obs that has the lowest rating, which in this case is 4+ rather than 3- considering the highest to lowest scale happens to be from 1 to 10 and so the R% would be 100% comparing 50% and 100%.

5. Therefore for the example ID:900135, I want the final result(**WANT**) as -

**ID Rating1 Rating2 Rating3 R%**

900135 Baa1 A- 100 /*this has to be derived with 4+ being lower than 3- in look up rating when comparing Baa1 and A- , if that makes sense*/

**I'd appreciate any help I can receive and that would really mean a lot**.

Many Thanks,

Charlotte

Accepted Solutions

Solution

01-20-2015
04:38 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

01-20-2015 04:38 PM

I had a mistake one like of the code:

Change:

_d[_n_]=divide(find(_z[_n_], _c[_n_], 't'), 7);

To:

_d[_n_]=floor(divide(findw(_z[_n_], _c[_n_], ,'st'), 7))+1;

All Replies

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

01-20-2015 12:34 PM

You are knowing the usage with SAS formats?

That dataset2 is seeming to me pretty small.

---->-- ja karman --<-----

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

01-20-2015 02:36 PM

data one;

infile cards truncover;

input id $ (rating1-rating3) (:$7.);

cards;

900129

2000425

900135 Baa1 A-

2000328

900152 Baa1 NR A

900265 Ba1 BBB-

900005 Baa1 A+

900108 Baa2 BBB+ A

900121 A1 A A+

900008 A2 A+

6964444 Baa3 BB+

900149 BBB+

900124

000100-502

900153 A2 A+

2000315

900136 Baa1 BBB+

900122 Aa3 AA- AA-

900155 A3 BBB+ BBB+

1103127

900185 A1 A+

900151 Baa1

900172 A2 A- A

900007 A- A

99900096 Aa3 A+ A+

900216 A2 NR A+

99900169 Baa1 BBB+ A+

900016 A1 A- A+

900385 Aa3 A+ AA-

;

data two;

infile cards truncover;

input Rank (Rating_LKUP Q_RATING R_RATING S_RATING R_PCT) ($);

cards;

23 1 Aaa AAA AAA 20%

22 2+ Aa1 AA+ AA+ 20%

21 2 Aa2 AA AA 20%

20 2- Aa3 AA- AA- 20%

19 3+ A1 A+ A+ 50%

18 3 A2 A A 50%

17 3- A3 A- A- 50%

16 4+ Baa1 BBB+ BBB+ 100%

15 4 Baa2 BBB BBB 100%

14 4- Baa3 BBB- BBB- 100%

13 5+ Ba1 BB+ BB+ 100%

12 5 Ba2 BB BB 100%

11 5- Ba3 BB- BB- 100%

10 6+ B1 B+ B+ 150%

9 6 B2 B B 150%

8 6- B3 B- B- 150%

7 7+ Caa1 CCC+ CCC+ 150%

6 7 Caa2 CCC CCC 150%

5 7- Caa3 CCC- CCC- 150%

4 8

3 9

2 10

1 UR Unrated Unrated Unrated

;

data x;

do _n_=1 by 1 until(p1done);

set two end=p1done;

by descending rank;

array _p[0:23] $7 _temporary_;

_p[_n_]=R_PCT;

array _z_[3,23] $7 _temporary_;

_z_[1,_n_]=Q_RATING;

_z_[2,_n_]=R_RATING;

_z_[3,_n_]=S_RATING;

end; /*p1done*/

array _z[3] $161 _temporary_;

do _n_=1 to 3;

_z[_n_]=peekclong(addrlong(_z_[_n_,1]), 161);

end;

do until(p2done);

set one end=p2done;

array _c[3] rating1-rating3;

array _d[3] _temporary_;

do _n_=1 to 3;

if missing(_c[_n_]) then continue;

_d[_n_]=floor(divide(findw(_z[_n_], _c[_n_], ,'st'), 7))+1;

end;

r_pct=_p[max(of _d

output;

call missing(of _d

end; /*p2done*/

stop;

keep id rating1-rating3 r_pct;

run;

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

01-20-2015 03:28 PM

Sir, First off thank you so much for your great response and for the **marvelous** and **classy approach!**..Just a small clarification. I just ran your code and it produced the following result

R_PCT | id | rating1 | rating2 | rating3 |

900129 | ||||

2000425 | ||||

50% | 900135 | Baa1 | A- |

/*for **id 900135**, the r_pct should be 100% instead of 50% for the reason 4+rating_lkup of bba1 is lower than 3- rating_lkup, as we need to pick r_pct for the lowest rating and as the scale goes 1-10 with 1 being the highest rating and 10 being the lowest.*/

May i request that slight change?

Please accept my apologies for the bother and I must say I can't thank you enough.

Sincerely,

Charlotte

Solution

01-20-2015
04:38 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

01-20-2015 04:38 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

01-21-2015 01:39 AM

I use Rank as rate level . i.e. pick up the min of Rank.

data one; infile cards truncover; input id : $20. (rating1-rating3) ($); cards; 900129 . . . 2000425. . . 900135 Baa1 . A- 2000328 . . . 900152 Baa1 NR A 900265 Ba1 BBB- . 900005 Baa1 . A+ 900108 Baa2 BBB+ A 900121 A1 A A+ 900008 A2 . A+ 6964444 Baa3 BB+ . 900149 . . BBB+ 900124 . . . 000100-502 . . . 900153 A2 . A+ 2000315 . . . 900136 Baa1 BBB+ . 900122 Aa3 AA- AA- 900155 A3 BBB+ BBB+ 1103127 . . . 900185 A1 . A+ 900151 Baa1 . . 900172 A2 A- A 900007 . A- A 99900096 Aa3 A+ A+ 900216 A2 NR A+ 99900169 Baa1 BBB+ A+ 900016 A1 A- A+ 900385 Aa3 A+ AA- ; data two; infile cards truncover; input Rank (Rating_LKUP Q_RATING R_RATING S_RATING R_PCT) ($); cards; 23 1 Aaa AAA AAA 20% 22 2+ Aa1 AA+ AA+ 20% 21 2 Aa2 AA AA 20% 20 2- Aa3 AA- AA- 20% 19 3+ A1 A+ A+ 50% 18 3 A2 A A 50% 17 3- A3 A- A- 50% 16 4+ Baa1 BBB+ BBB+ 100% 15 4 Baa2 BBB BBB 100% 14 4- Baa3 BBB- BBB- 100% 13 5+ Ba1 BB+ BB+ 100% 12 5 Ba2 BB BB 100% 11 5- Ba3 BB- BB- 100% 10 6+ B1 B+ B+ 150% 9 6 B2 B B 150% 8 6- B3 B- B- 150% 7 7+ Caa1 CCC+ CCC+ 150% 6 7 Caa2 CCC CCC 150% 5 7- Caa3 CCC- CCC- 150% 4 8 . . . . 3 9 . . . . 2 10 . . . . 1 UR Unrated Unrated Unrated . ; run; data want; if _n_ eq 1 then do; if 0 then set two; declare hash h1(dataset:'two'); h1.definekey('Q_RATING'); h1.definedata('Rank','R_PCT'); h1.definedone(); declare hash h2(dataset:'two'); h2.definekey('R_RATING'); h2.definedata('Rank','R_PCT'); h2.definedone(); declare hash h3(dataset:'two'); h3.definekey('S_RATING'); h3.definedata('Rank','R_PCT'); h3.definedone(); end; set one; min_rank=9999; if h1.find(key:rating1)=0 and not missing(rating1) then do; if Rank lt min_rank then do;min_rank=Rank; R=R_PCT;end; end; if h2.find(key:rating2)=0 and not missing(rating2) then do; if Rank lt min_rank then do;min_rank=Rank; R=R_PCT;end; end; if h3.find(key:rating3)=0 and not missing(rating3) then do; if Rank lt min_rank then do;min_rank=Rank; R=R_PCT;end; end; keep id rating1-rating3 R; run;

Xia Keshan

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

01-21-2015 02:08 AM

Here is a different approach using formats:

data fmts;

set two end=done;

by descending rank;

array a[3] Q_RATING R_RATING S_RATING;

do i=1 to dim(a);

start=a*;*

if missing(start) then continue;

fmtname=cats(scan(vname(a*), 1, '_'), 'rank');*

label=put(rank, best. -l);

type='i';

output;

end;

if done then do fmtname='Qrank', 'Rrank', 'Srank';

hlo='o';

label='';

output;

end;

hlo='';

fmtname='rank2pct';

start=put(rank, best. -l);

label=r_pct;

type='n';

output;

if done then do;

hlo='o';

label='';

output;

end;

run;

proc sort data=fmts; by fmtname start; run;

proc format cntlin=fmts; run;

data x;

set one;

array a[3] rating1-rating3;

array fmt[3] $ 6 _temporary_ ('qrank.' 'rrank.' 'srank.');

array b[3] _temporary_;

do _n_=1 to 3;

if missing(a[_n_]) then continue;

b[_n_]=inputn(a[_n_], fmt[_n_]);

end;

r_pct=put(min(of b

run;

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

01-21-2015 06:06 AM

Hi Magnificent Greats, Please do not doubt my courtesy to have not marked the answer correct yet as I am yet to test the same on my office real work environment. Like I said earlier, I can't thank you enough for the help and will certainly respond soon once I am done testing on my work environment and cheer a green. I am sure yours/Xia's will definitely work, but i am just holding to not yet close the question for the reason if I may have any further clarifications.

Above all, Thank you for your invaluable time and that is something I could never pay back.

Sincere and utmost Thanks,

Charlotte from England

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

01-21-2015 06:46 AM

You can do a similar thing in SQL (not optimized as not much time to look at this):

data dataset1;

id=900135; rating1="Baa1"; rating2=""; rating3="A-"; output;

run;

data dataset2;

length rating q_rating r_rating s_rating $20;

rank=17; rating="-3"; q_rating="A3"; r_rating="A-"; s_rating="A-"; r_pcent=50; output;

rank=16; rating="4+"; q_rating="Baa1"; r_rating="BBB+"; s_rating="BBB+"; r_pcent=100; output;

run;

proc sql;

create table INTER as

select A.*,

(select min(RANK) from DATASET2 where A.RATING1=Q_RATING or A.RATING2=R_RATING or A.RATING3=S_RATING) as MIN_RANK

from DATASET1 A;

create table WANT as

select A.*,

B.R_PCENT

from INTER A

left join DATASET2 B

on A.MIN_RANK=B.RANK;

quit;

Although it would be easier if the data was normalized.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

01-21-2015 10:34 AM

I appreciate your well constructed question. It was entertaining to solve. Had Xia not posted the solution using hash, I would have also. Always good to see the different methods.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

01-22-2015 03:35 PM

Sir/Xia/RW, Awesome, elegant or eloquent are not just enough to describe your solutions but my vocabulary is limited and they worked amazingly well. I am sure you are tired of and are used to too many appreciations from many, but i sincerely want to thank you for your generosity by contributing your invaluable time by helping me with this problem.

Thanks a ton and Cheers,

Charlotte from England!

Oops, by the way, may i ask how long have you been doing SAS programming?

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

01-22-2015 04:37 PM

Professionally, for about 8-9 years and 2-3 years as a student prior to that.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

02-04-2015 03:48 PM

Sir, Good evening from England, Sorry for the bother yet again, May i request your help in modifying your code that brilliantly uses the multi dim temp array/*my comment- that was a class act of yours*/ or even the formats approach to work for a slightly altered requirement:-

1. The Dataset1 has another variable by the name **Market_type**, that has 3 kinds of values namely **"Retail", "Wholesale" and "SME**". What it means is that the ID, being a business entity belongs to one of the 3 market types.

2. Dataset 2, being the rating look up Data set remains the same, however there is one separate "Rating look up table" for Retail, and one for wholesale and likewise one for SME. So In essence there are 3 different rating look up data sets.

3. Now the point is that, the look up needs to be done from Dataset 1 <---> Retail look up dataset for id's that are retail in market_type exclusively. And similarly from Dataset1 <---> Wholesale look up table for id's that are wholesale and Dataset1<--->SME look up table. */*Sounds like a filter to be applied in dataset1 for my novice level of understanding*/*

4. The logic to get the R% remains same, expect *" when there are 3 ratings for an ID, I need to pick R% for ID's using the middle of the 3 lkup_ratings instead of the lowest as you did before otherwise for remaining occurances such as 1 or 2 ratings the same lowest lkup ratings applies as before."*

5. If there are no ratings for an ID with blank values, that means the ID falls into the category of Unrated, which will have to pick the default R% from respective Ratings datasets from the unrated observation. /**The weird thing here is that in some rating look up table, the unrated observation is represented as blank unlike what you see in the last row in Dataset2 where it is clearly mentioned*/*

1 | UR | Unrated | Unrated | Unrated |

I fear whether I have explained well enough to make it easy enough to understand. If not, please let me know if I can explain better. I'd sincerely appreciate your help. Many thanks as always!

With Pleasant Regards,

Charlotte

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

02-04-2015 04:21 PM

Charlotte,

I would challenge you to take one of these examples and attempt to modify it to meet this additional criteria. The hash method presented by would not be difficult to add additional ratings for with the new Market_type class and the same goes for the method using formats. The first method I posted would probably be the most complex to add this new functionality to, which, if that is the method you chose, may be why you are finding this difficult.

For the hash method you could do something like the following:

select(market_type);

when ('retail') do;

...get the min rating;

end;

when ('wholesale') do;

...

otherwise;

end;

You will also want to change from looking from the MIN_RANK to MAX_RANK, if I remember you problem correctly and set the default for MAX_RANK to 1 to get the UR result you want.

For modifying the Format method:

create the 3 data sets to 9 unique formats

add a second dimension to the array fmt, this dimension will be the market_type

array fmt[3,0:4] $ 32 _temporary_ ('retail' 'retailQrank.' 'retailRrank.' 'retailStype.' 'retailrank2pct.' ...);

and then lookup with help from whichc function

b[_n_]=inputn(a[_n_], fmt[whichc(market_type, fmt[1,0], fmt[2,0], fmt[3,0])_n_]);

then modify the rank2pct lookup in suit and add logic to set to UR when needed.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

02-05-2015 02:00 AM

Good morning,Thank you and really sorry, I was a bit too overambitious and really fancied your first approach before reality bit me hard. :smileycry:To keep it simple, what I did was I filtered the dataset1 to output three different datasets based on the market_type condition, thus having 3 subsets, one for retail, one for wholesale and one for SME. This makes it easier for me to get back to using the original formats method code you helped me with and I can run three times. No problems there as I don't have to worry about the additional criteria 1,2 and 3.

But this condition dumps me though-

4. The logic to get the R% remains same, expect *" when there are 3 ratings for an ID, I need to pick R% for ID's using the middle of the 3 lkup_ratings instead of the lowest as you did before otherwise for remaining occurances such as 1 or 2 ratings the same lowest lkup ratings applies as before."*

Middle rating??? makes it very difficult for me whereas min or max is rather clear and easy to understand though

In the last part of your format code:

do _n_=1 to 3;

if missing(a[_n_]) then continue;

b[_n_]=inputn(a[_n_], fmt[_n_]);

end;

r_pct=put(min(of b

Thanks so much,

Charlotte