Solved: Lookup problem, with multiple categories

MKS2204 · Posted 10-14-2019 05:14 PM

Hi All,

I am facing a programming logic roadblock. Tried a lot of ways but not successful.

I have a SAS data set as below :

Testcode	score1	score2	score120	Final_score1	Final_score2	Final_score120	Prof_level
41	1	4.5	2.3.4	74	230	555	1 Low 2.2 Medium
41	.	5.6	.	450	130	250	2.2 Medium 2 High
51	6.6	2.1	2.3.4	450	130	555	2.1 Medium 1 Low

And i also have an excel sheet containing below data :

Testcode	Proficiency	Score	Lowerlimit	Upperlimit
41	Low	1	50	100
41	Medium	1	101	150
41	High	1	151	200
41	Low	2	75	125
41	Medium	2	126	175
41	High	2	176	200
51	Low	1	50	100
51	Medium	1	101	150
51	High	1	151	200
51	Low	2.1	75	125
51	Medium	2.1	126	175
51	High	2.1	176	200

In the dataset there are 120 columns named "score1 - score120" also Final score1-finalscore120.

Now I have to check each unique "score" ( from the excel) for the testcode in the dataset columns Score1 - Score120 , If there is a hit, I need to take corresponding "Finalscore" column ( from 120 columns in dataset) and lookup that "Finalscore" against the excel for the same testcode and score , and find out what range of upperlimit and lowerlimit it falls in., Once I have the hit, I need to take the "Proficiency" value from the excel and check that value in dataset "Prof_Level" column , if that value is present in that string along with the Score value in process.

Example :

From excel sheet, we take Score = 1 for testcode 41 and try to find it in the dataset column Score1-Score120 , We have "Score1" in dataset =1, so thats a hit, Now we take corresponding Final Score from dataset, in this case "Final_score1" = 74, Now we take 74 and check in excel sheet against testcode 41 and score=1 and see what range it falls in , It falls in row 1 in excel sheet, Now We take value "Low" from there and check "PROF_Level" in dataset if it has the "Low" next to "1". and we have Low next to "1" so we are good, else we flag that record as error.

I know its a lot, But I am not able to get the solution to it, and I am pretty sure Its not a straight forward joins or lookup.

I am trying and would have tried more myself, I have to complete something on very urgent bases.

Please let me know if you need more information.

Appreciate all your help.

Thanks

Manoj

Patrick · Posted 10-14-2019 10:41 PM

@MKS2204 wrote:

....

Now I have to check each unique "score" ( from the excel) for the testcode in the dataset columns Score1 - Score120 , If there is a hit, I need to take corresponding "Finalscore" column ( from 120 columns in dataset) and lookup that "Finalscore" against the excel for the same testcode and score , and find out what range of upperlimit and lowerlimit it falls in., Once I have the hit, I need to take the "Proficiency" value from the excel and check that value in dataset "Prof_Level" column , if that value is present in that string along with the Score value in process.

Example :

From excel sheet, we take Score = 1 for testcode 41 and try to find it in the dataset column Score1-Score120 , We have "Score1" in dataset =1, so thats a hit, Now we take corresponding Final Score from dataset, in this case "Final_score1" = 74, Now we take 74 and check in excel sheet against testcode 41 and score=1 and see what range it falls in , It falls in row 1 in excel sheet, Now We take value "Low" from there and check "PROF_Level" in dataset if it has the "Low" next to "1". and we have Low next to "1" so we are good, else we flag that record as error.

Below a solution for how I understand your requirement.

data ds;
  infile datalines truncover dlm='|';
  input Testcode score1 score2 score120 Final_score1  Final_score2 Final_score120 Prof_level $20. ;
  datalines;
41|1|4.5|23.4|74|230|555|1 Low 2.2 Medium
41|.|5.6|.|450|130|250|2.2 Medium 2 High
51|6.6|2.1|23.4|450|130|555|2.1 Medium 1 Low
;

data Excel;
  infile datalines truncover dlm='|';
  input Testcode Proficiency :$10. MatchScore Lowerlimit Upperlimit $20.;
  datalines;
41|Low|1|50|100
41|Medium|1|101|150
41|High|1|151|200
41|Low|2|75|125
41|Medium|2|126|175
41|High|2|176|200
51|Low|1|50|100
51|Medium|1|101|150
51|High|1|151|200
51|Low|2.1|75|125
51|Medium|2.1|126|175
51|High|2.1|176|200
;

data want(drop=_:);
  if _n_=1 then
    do;
      if 0 then set ds excel;
      dcl hash h1(dataset:'excel', multidata:'y');
      h1.defineKey('Testcode','MatchScore');
      h1.defineData(all:'y');
      h1.defineDone();
    end;
  call missing(of _all_);

  set ds;
  array scores {*} score:;
  array Final_scores {*} Final_score:;

  ScoreMatch_flg=0;
  do _i=1 to dim(scores);
    do while(h1.do_over(key:Testcode,key:scores[_i]) eq 0);
      if lowerlimit<=Final_scores[_i]<=upperlimit then 
        do;
          ScoreMatch_flg= (scores[_i]=input(scan(Prof_level,findw(Prof_level, strip(Proficiency), ' ', 'EI')-1,' '),?? best32.));
          /* once we've found a match don't test any further scores in the observation */
          _i=dim(scores)+1;
          leave;
        end;
    end;
  end;
run;

proc print;
run;

View solution in original post

Reeza · Posted 10-14-2019 08:28 PM

Based on above inputs, please show what you want as output, exactly. Makes sure the input and output lines up based on your sample data. If you can provide your data as a full data step you're more likely to get a response as well, rather than as embedded in an Excel file or as HTML text (it carries weird formatting when you copy/paste into SAS).

Here are instructions on how to provide sample data as a data step:
https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat...

Good Luck.

@MKS2204 wrote:

Hi All,

I am facing a programming logic roadblock. Tried a lot of ways but not successful.

I have a SAS data set as below :

Testcode score1 score2 score120 Final_score1 Final_score2 Final_score120 Prof_level

41 1 4.5 2.3.4 74 230 555 1 Low 2.2 Medium

41 . 5.6 . 450 130 250 2.2 Medium 2 High

51 6.6 2.1 2.3.4 450 130 555 2.1 Medium 1 Low

And i also have an excel sheet containing below data :

Testcode Proficiency Score Lowerlimit Upperlimit

41 Low 1 50 100

41 Medium 1 101 150

41 High 1 151 200

41 Low 2 75 125

41 Medium 2 126 175

41 High 2 176 200

51 Low 1 50 100

51 Medium 1 101 150

51 High 1 151 200

51 Low 2.1 75 125

51 Medium 2.1 126 175

51 High 2.1 176 200

In the dataset there are 120 columns named "score1 - score120" also Final score1-finalscore120.

Now I have to check each unique "score" ( from the excel) for the testcode in the dataset columns Score1 - Score120 , If there is a hit, I need to take corresponding "Finalscore" column ( from 120 columns in dataset) and lookup that "Finalscore" against the excel for the same testcode and score , and find out what range of upperlimit and lowerlimit it falls in., Once I have the hit, I need to take the "Proficiency" value from the excel and check that value in dataset "Prof_Level" column , if that value is present in that string along with the Score value in process.

Example :

From excel sheet, we take Score = 1 for testcode 41 and try to find it in the dataset column Score1-Score120 , We have "Score1" in dataset =1, so thats a hit, Now we take corresponding Final Score from dataset, in this case "Final_score1" = 74, Now we take 74 and check in excel sheet against testcode 41 and score=1 and see what range it falls in , It falls in row 1 in excel sheet, Now We take value "Low" from there and check "PROF_Level" in dataset if it has the "Low" next to "1". and we have Low next to "1" so we are good, else we flag that record as error.

I know its a lot, But I am not able to get the solution to it, and I am pretty sure Its not a straight forward joins or lookup.

I am trying and would have tried more myself, I have to complete something on very urgent bases.

Please let me know if you need more information.

Appreciate all your help.

Thanks

Manoj

Reeza · Posted 10-14-2019 08:32 PM

FYI - I updated your subject line to be more reflective of your problem.

Patrick · Posted 10-14-2019 10:41 PM

@MKS2204 wrote:

....

Now I have to check each unique "score" ( from the excel) for the testcode in the dataset columns Score1 - Score120 , If there is a hit, I need to take corresponding "Finalscore" column ( from 120 columns in dataset) and lookup that "Finalscore" against the excel for the same testcode and score , and find out what range of upperlimit and lowerlimit it falls in., Once I have the hit, I need to take the "Proficiency" value from the excel and check that value in dataset "Prof_Level" column , if that value is present in that string along with the Score value in process.

Example :

From excel sheet, we take Score = 1 for testcode 41 and try to find it in the dataset column Score1-Score120 , We have "Score1" in dataset =1, so thats a hit, Now we take corresponding Final Score from dataset, in this case "Final_score1" = 74, Now we take 74 and check in excel sheet against testcode 41 and score=1 and see what range it falls in , It falls in row 1 in excel sheet, Now We take value "Low" from there and check "PROF_Level" in dataset if it has the "Low" next to "1". and we have Low next to "1" so we are good, else we flag that record as error.

Below a solution for how I understand your requirement.

data ds;
  infile datalines truncover dlm='|';
  input Testcode score1 score2 score120 Final_score1  Final_score2 Final_score120 Prof_level $20. ;
  datalines;
41|1|4.5|23.4|74|230|555|1 Low 2.2 Medium
41|.|5.6|.|450|130|250|2.2 Medium 2 High
51|6.6|2.1|23.4|450|130|555|2.1 Medium 1 Low
;

data Excel;
  infile datalines truncover dlm='|';
  input Testcode Proficiency :$10. MatchScore Lowerlimit Upperlimit $20.;
  datalines;
41|Low|1|50|100
41|Medium|1|101|150
41|High|1|151|200
41|Low|2|75|125
41|Medium|2|126|175
41|High|2|176|200
51|Low|1|50|100
51|Medium|1|101|150
51|High|1|151|200
51|Low|2.1|75|125
51|Medium|2.1|126|175
51|High|2.1|176|200
;

data want(drop=_:);
  if _n_=1 then
    do;
      if 0 then set ds excel;
      dcl hash h1(dataset:'excel', multidata:'y');
      h1.defineKey('Testcode','MatchScore');
      h1.defineData(all:'y');
      h1.defineDone();
    end;
  call missing(of _all_);

  set ds;
  array scores {*} score:;
  array Final_scores {*} Final_score:;

  ScoreMatch_flg=0;
  do _i=1 to dim(scores);
    do while(h1.do_over(key:Testcode,key:scores[_i]) eq 0);
      if lowerlimit<=Final_scores[_i]<=upperlimit then 
        do;
          ScoreMatch_flg= (scores[_i]=input(scan(Prof_level,findw(Prof_level, strip(Proficiency), ' ', 'EI')-1,' '),?? best32.));
          /* once we've found a match don't test any further scores in the observation */
          _i=dim(scores)+1;
          leave;
        end;
    end;
  end;
run;

proc print;
run;

MKS2204 · Posted 10-15-2019 03:22 PM

@Reeza Thank you for updating the post.

@Patrick Thanks Patrick, For taking time to look at it. I am sorry if i was not clear and complete with my requirements, As Reeza mentioned I should have put all information ( I apologize as I am new to posting questions, I will learn) 🙂

I have added few more rows to input dataset, and also attached the desired output.

Complete Requirements ( added bold text to it 😞

I have to check each unique "score" for each testcode ( from the excel) in the dataset columns Score1 - Score120 , If there is a hit, I need to take corresponding "Final score" column ( from 120 columns in dataset) and lookup that "Final score" against the excel for the same testcode and score , and find out what range of upperlimit and lowerlimit it falls in, ( If it doesn't fall in any of the range in excel sheet for that test and score, we create a flag called error_Final Score(_i) = 1, also we create a counter effor_flag, ) , If I have the hit, I need to take the "Proficiency" value from the excel and check that value in dataset "Prof_Level" column, we check the proficiency from excel in Prof_Level only when that "score" is present in the Prof_Level , else we move to next score from excel, if the "score" is there in Prof_Level then we check Proficiency value from excel next to the matched score in Prof_level and if the value doesn't match we generate error on Prof_Level ( Error_Prof_Level =1 ) and increment effor_flag+1, if we do not have the "score" from excel that we are checking in this iteration not there in Prof_Level at all , we do not do anything, just move to the next "score" from excel for that test and repeat.

If we have another mismatch like this for that record , we generate error on Prof_Level1 and increment error_flag+1;

Note - Because there will be max two proficiency in "Prof_level" column in DS , I created one more column Prof_level1 to capture both errors if two matches for a testcode fails.

Example:

From excel sheet, we take Score = 1 for testcode 41 and try to find it in the dataset column Score1-Score120 , We have "Score1" in dataset =1, so thats a hit, Now we take corresponding Final Score from dataset, in this case "Final_score1" = 74, Now we take 74 and check in excel sheet against testcode 41 and score=1 and see what range it falls in ( If it doesn't fall in any of the range in excel sheet for that test and score, we create a flag called error_Final Score(_i) = 1, also we create a counter say effor_flag, ) , now in our data It falls in row 1 in excel sheet, Now We take value "Low" from there and check "PROF_Level" in dataset if it has the "Low" next to "1". and we have Low next to "1" so we are good, else If we have "score" from excel sheet in "PROF_Level" in dataset but the "proficiency from excel doesn't match with the string next to the score then we generate error, and increment effor_flag+1, if we do not have the "score" from excel that we are checking in this iteration not there in Prof_Level at all , we do not do anything, just move to the next "score" from excel and repeat.

data ds;
infile datalines truncover dlm='|';
input Testcode score1 score2 score120 Final_score1 Final_score2 Final_score120 Prof_level ~ $20. Prof_level1 ~ $20. ;
datalines;
41|1|4.5|23.4|74|230|555|1 Low 2.2 Medium|1 Low 2.2 Medium
41|.|5.6|.|450|130|250|2.2 Medium 2 High|2.2 Medium 2 High
51|6.6|2.1|23.4|450|130|555|2.1 Medium 1 Low|2.1 Medium 1 Low
51|6.6|2.9|1|450|130|175|1 Medium 1 Low|2.1 Medium 1 Low
51|1|2|2.1|175|130|175|1 Medium 2 Low|1 Medium 2 Low
51|1|2|2.1|175|130|175|1 High 2.1 Low|1 High 2.1 Low
;

data Excel;
infile datalines truncover dlm='|';
input Testcode Proficiency :$10. MatchScore Lowerlimit Upperlimit $20.;
datalines;
41|Low|1|50|100
41|Medium|1|101|150
41|High|1|151|200
41|Low|2|75|125
41|Medium|2|126|175
41|High|2|176|200
51|Low|1|50|100
51|Medium|1|101|150
51|High|1|151|200
51|Low|2.1|75|125
51|Medium|2.1|126|175
51|High|2.1|176|200
;

Example below should not flag an error as there is no match of the score ( 2.1 ) in the "Prof_Level" column (1 & 2.2 ).

data ds;
infile datalines truncover dlm='|';
input Testcode score1 score2 score120 Final_score1 Final_score2 Final_score120 Prof_level ~ $20. Prof_level ~ $20. ;
datalines;
51|6.6|2.1|23.4|450|130|555|1 Low 2.2 Medium|1 Low 2.2 Medium
;

data Excel;
infile datalines truncover dlm='|';
input Testcode Proficiency :$10. MatchScore Lowerlimit Upperlimit $20.;
datalines;
51|Low|2.1|75|125
51|Medium|2.1|126|175
51|High|2.1|176|200
;

Desired output:

It would be great if you can please help on this. Appreciate all your help, have a nice day.

Thanks

Manoj

MKS2204 · Posted 10-17-2019 01:11 AM

Hi, Request everyone to please have a look at the problem, Appreciate your help.

Patrick · Posted 10-17-2019 03:29 AM

@MKS2204 wrote:

Hi, Request everyone to please have a look at the problem, Appreciate your help.

Just to explain to you why I'm not answering: There is just too much in your question for me to get my head around within reasonable time. You get normally more answers quicker if you break a problem apart, then try to first solve the steps yourself and ask targeted questions for where you get stuck.

It's also normally a good idea to not only post sample data and desired output but to also post your code (fully working or not) and explain where you got stuck. Posting your own code also lets us better understand on what level you are so we can provide answers which you can understand.

andreas_lds · Posted 10-17-2019 03:42 AM

@MKS2204 wrote:

Hi, Request everyone to please have a look at the problem, Appreciate your help.

Fully agree to @Patricks comment. The problem is to big to be worked on in my spare time.

MKS2204 · Posted 10-17-2019 01:02 PM

@andreas_lds @Patrick I understand what you guys are saying, and I understand it, I tried myself a lot, was trying very lengthy methods, of transposing and then joins etc but could not join horizontal data in DS to vertical scores in Excel. I never used hash which seems to be the solution here . I have a deliverable based on this requirement so I got desperate to get the solution, and wrote the whole big requirement, which I know doesn't seem right.

I know everyone has time constraints, But thank you for taking time out, I got the lead , working on this, If I get stuck somewhere I will ask for your opinion, if you find time, It will be highly appreciated.

Have a Good One.

Manoj

Reeza · Posted 10-17-2019 04:20 PM

Hash is a solution that someone posted but not necessarily required. Your format is part of the issue for sure, I would recommend switching your first file to a long format rather than wide format and trying a join at that point.

MKS2204 · Posted 11-04-2019 04:34 PM

Thank you All.

Lookup problem, with multiple categories

Re: Lookup problem, with multiple categories

Re: A little Complex SAS Problem

Re: Lookup problem, with multiple categories

Re: Lookup problem, with multiple categories

Re: Lookup problem, with multiple categories

Re: Lookup problem, with multiple categories

Re: Lookup problem, with multiple categories

Re: Lookup problem, with multiple categories

Re: Lookup problem, with multiple categories

Re: Lookup problem, with multiple categories

Re: Lookup problem, with multiple categories

Registration is open

SAS Training: Just a Click Away