BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
MKS2204
Obsidian | Level 7

Hi All,

 

I am facing a programming logic roadblock. Tried a lot of ways but not successful.

 

I have a SAS data set as below :

 

Testcode score1  score2 score120 Final_score1  Final_score2 Final_score120 Prof_level
41 1 4.5 2.3.4 74 230 555 1 Low 2.2 Medium
41 . 5.6 . 450 130 250 2.2 Medium 2 High
51 6.6 2.1 2.3.4 450 130 555 2.1 Medium 1 Low

 

 

And i also have an excel sheet containing below data :

Testcode Proficiency Score Lowerlimit Upperlimit
41 Low 1 50 100
41 Medium 1 101 150
41 High 1 151 200
41 Low 2 75 125
41 Medium 2 126 175
41 High 2 176 200
51 Low 1 50 100
51 Medium 1 101 150
51 High 1 151 200
51 Low 2.1 75 125
51 Medium 2.1 126 175
51 High 2.1 176 200

 

In the dataset there are 120 columns named  "score1 - score120" also Final score1-finalscore120.

 

Now I have to check each unique "score" ( from the excel) for the testcode  in the dataset columns Score1 - Score120 , If there is a hit, I need to take corresponding "Finalscore" column ( from 120 columns in dataset) and lookup that "Finalscore" against the excel for the same testcode and score , and find out what range of upperlimit and lowerlimit  it falls in., Once I have the hit, I need to take the "Proficiency" value from the excel and check that value in dataset "Prof_Level" column , if that value is present in that string along with the Score value in process.

 

Example :

From excel sheet, we take Score = 1 for testcode 41 and try to find it in the dataset column  Score1-Score120 , We have "Score1" in dataset =1, so thats a hit, Now we take corresponding Final Score from dataset, in this case "Final_score1" = 74, Now we take 74 and check in excel sheet against testcode 41 and score=1 and see what range it falls in , It falls in row 1 in excel sheet, Now We take value  "Low" from there and check "PROF_Level" in dataset if it has the "Low" next to "1".  and we have Low next to "1" so  we are good, else we flag that record as error. 

 

I know its a lot, But I am not able to get the solution to it, and I am pretty sure Its not a straight forward joins or lookup.

I am trying and would have tried more myself, I have to complete something on very urgent bases.

 

Please let me know if you need more information.

 

Appreciate all your help.

 

Thanks

Manoj

1 ACCEPTED SOLUTION

Accepted Solutions
Patrick
Opal | Level 21

@MKS2204 wrote:

....

Now I have to check each unique "score" ( from the excel) for the testcode  in the dataset columns Score1 - Score120 , If there is a hit, I need to take corresponding "Finalscore" column ( from 120 columns in dataset) and lookup that "Finalscore" against the excel for the same testcode and score , and find out what range of upperlimit and lowerlimit  it falls in., Once I have the hit, I need to take the "Proficiency" value from the excel and check that value in dataset "Prof_Level" column , if that value is present in that string along with the Score value in process.

 

Example :

From excel sheet, we take Score = 1 for testcode 41 and try to find it in the dataset column  Score1-Score120 , We have "Score1" in dataset =1, so thats a hit, Now we take corresponding Final Score from dataset, in this case "Final_score1" = 74, Now we take 74 and check in excel sheet against testcode 41 and score=1 and see what range it falls in , It falls in row 1 in excel sheet, Now We take value  "Low" from there and check "PROF_Level" in dataset if it has the "Low" next to "1".  and we have Low next to "1" so  we are good, else we flag that record as error. 

 


Below a solution for how I understand your requirement.

data ds;
  infile datalines truncover dlm='|';
  input Testcode score1 score2 score120 Final_score1  Final_score2 Final_score120 Prof_level $20. ;
  datalines;
41|1|4.5|23.4|74|230|555|1 Low 2.2 Medium
41|.|5.6|.|450|130|250|2.2 Medium 2 High
51|6.6|2.1|23.4|450|130|555|2.1 Medium 1 Low
;

data Excel;
  infile datalines truncover dlm='|';
  input Testcode Proficiency :$10. MatchScore Lowerlimit Upperlimit $20.;
  datalines;
41|Low|1|50|100
41|Medium|1|101|150
41|High|1|151|200
41|Low|2|75|125
41|Medium|2|126|175
41|High|2|176|200
51|Low|1|50|100
51|Medium|1|101|150
51|High|1|151|200
51|Low|2.1|75|125
51|Medium|2.1|126|175
51|High|2.1|176|200
;

data want(drop=_:);
  if _n_=1 then
    do;
      if 0 then set ds excel;
      dcl hash h1(dataset:'excel', multidata:'y');
      h1.defineKey('Testcode','MatchScore');
      h1.defineData(all:'y');
      h1.defineDone();
    end;
  call missing(of _all_);

  set ds;
  array scores {*} score:;
  array Final_scores {*} Final_score:;

  ScoreMatch_flg=0;
  do _i=1 to dim(scores);
    do while(h1.do_over(key:Testcode,key:scores[_i]) eq 0);
      if lowerlimit<=Final_scores[_i]<=upperlimit then 
        do;
          ScoreMatch_flg= (scores[_i]=input(scan(Prof_level,findw(Prof_level, strip(Proficiency), ' ', 'EI')-1,' '),?? best32.));
          /* once we've found a match don't test any further scores in the observation */
          _i=dim(scores)+1;
          leave;
        end;
    end;
  end;
run;

proc print;
run;

Capture.JPG

 

View solution in original post

10 REPLIES 10
Reeza
Super User

Based on above inputs, please show what you want as output, exactly. Makes sure the input and output lines up based on your sample data. If you can provide your data as a full data step you're more likely to get a response as well, rather than as embedded in an Excel file or as HTML text (it carries weird formatting when you copy/paste into SAS).

Here are instructions on how to provide sample data as a data step:
https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat...

 

Good Luck. 

 


@MKS2204 wrote:

Hi All,

 

I am facing a programming logic roadblock. Tried a lot of ways but not successful.

 

I have a SAS data set as below :

 

Testcode score1  score2 score120 Final_score1  Final_score2 Final_score120 Prof_level
41 1 4.5 2.3.4 74 230 555 1 Low 2.2 Medium
41 . 5.6 . 450 130 250 2.2 Medium 2 High
51 6.6 2.1 2.3.4 450 130 555 2.1 Medium 1 Low

 

 

And i also have an excel sheet containing below data :

Testcode Proficiency Score Lowerlimit Upperlimit
41 Low 1 50 100
41 Medium 1 101 150
41 High 1 151 200
41 Low 2 75 125
41 Medium 2 126 175
41 High 2 176 200
51 Low 1 50 100
51 Medium 1 101 150
51 High 1 151 200
51 Low 2.1 75 125
51 Medium 2.1 126 175
51 High 2.1 176 200

 

In the dataset there are 120 columns named  "score1 - score120" also Final score1-finalscore120.

 

Now I have to check each unique "score" ( from the excel) for the testcode  in the dataset columns Score1 - Score120 , If there is a hit, I need to take corresponding "Finalscore" column ( from 120 columns in dataset) and lookup that "Finalscore" against the excel for the same testcode and score , and find out what range of upperlimit and lowerlimit  it falls in., Once I have the hit, I need to take the "Proficiency" value from the excel and check that value in dataset "Prof_Level" column , if that value is present in that string along with the Score value in process.

 

Example :

From excel sheet, we take Score = 1 for testcode 41 and try to find it in the dataset column  Score1-Score120 , We have "Score1" in dataset =1, so thats a hit, Now we take corresponding Final Score from dataset, in this case "Final_score1" = 74, Now we take 74 and check in excel sheet against testcode 41 and score=1 and see what range it falls in , It falls in row 1 in excel sheet, Now We take value  "Low" from there and check "PROF_Level" in dataset if it has the "Low" next to "1".  and we have Low next to "1" so  we are good, else we flag that record as error. 

 

I know its a lot, But I am not able to get the solution to it, and I am pretty sure Its not a straight forward joins or lookup.

I am trying and would have tried more myself, I have to complete something on very urgent bases.

 

Please let me know if you need more information.

 

Appreciate all your help.

 

Thanks

Manoj


 

Reeza
Super User
FYI - I updated your subject line to be more reflective of your problem.
Patrick
Opal | Level 21

@MKS2204 wrote:

....

Now I have to check each unique "score" ( from the excel) for the testcode  in the dataset columns Score1 - Score120 , If there is a hit, I need to take corresponding "Finalscore" column ( from 120 columns in dataset) and lookup that "Finalscore" against the excel for the same testcode and score , and find out what range of upperlimit and lowerlimit  it falls in., Once I have the hit, I need to take the "Proficiency" value from the excel and check that value in dataset "Prof_Level" column , if that value is present in that string along with the Score value in process.

 

Example :

From excel sheet, we take Score = 1 for testcode 41 and try to find it in the dataset column  Score1-Score120 , We have "Score1" in dataset =1, so thats a hit, Now we take corresponding Final Score from dataset, in this case "Final_score1" = 74, Now we take 74 and check in excel sheet against testcode 41 and score=1 and see what range it falls in , It falls in row 1 in excel sheet, Now We take value  "Low" from there and check "PROF_Level" in dataset if it has the "Low" next to "1".  and we have Low next to "1" so  we are good, else we flag that record as error. 

 


Below a solution for how I understand your requirement.

data ds;
  infile datalines truncover dlm='|';
  input Testcode score1 score2 score120 Final_score1  Final_score2 Final_score120 Prof_level $20. ;
  datalines;
41|1|4.5|23.4|74|230|555|1 Low 2.2 Medium
41|.|5.6|.|450|130|250|2.2 Medium 2 High
51|6.6|2.1|23.4|450|130|555|2.1 Medium 1 Low
;

data Excel;
  infile datalines truncover dlm='|';
  input Testcode Proficiency :$10. MatchScore Lowerlimit Upperlimit $20.;
  datalines;
41|Low|1|50|100
41|Medium|1|101|150
41|High|1|151|200
41|Low|2|75|125
41|Medium|2|126|175
41|High|2|176|200
51|Low|1|50|100
51|Medium|1|101|150
51|High|1|151|200
51|Low|2.1|75|125
51|Medium|2.1|126|175
51|High|2.1|176|200
;

data want(drop=_:);
  if _n_=1 then
    do;
      if 0 then set ds excel;
      dcl hash h1(dataset:'excel', multidata:'y');
      h1.defineKey('Testcode','MatchScore');
      h1.defineData(all:'y');
      h1.defineDone();
    end;
  call missing(of _all_);

  set ds;
  array scores {*} score:;
  array Final_scores {*} Final_score:;

  ScoreMatch_flg=0;
  do _i=1 to dim(scores);
    do while(h1.do_over(key:Testcode,key:scores[_i]) eq 0);
      if lowerlimit<=Final_scores[_i]<=upperlimit then 
        do;
          ScoreMatch_flg= (scores[_i]=input(scan(Prof_level,findw(Prof_level, strip(Proficiency), ' ', 'EI')-1,' '),?? best32.));
          /* once we've found a match don't test any further scores in the observation */
          _i=dim(scores)+1;
          leave;
        end;
    end;
  end;
run;

proc print;
run;

Capture.JPG

 

MKS2204
Obsidian | Level 7

@Reeza Thank you for updating the post.

 

@Patrick  Thanks Patrick, For taking time to look at it. I am sorry if i was not clear and complete with my requirements, As Reeza mentioned I should have put all information ( I apologize as I am new to posting questions, I will learn) 🙂

 

I have added few more rows to input dataset, and also attached the desired output.

 

Complete Requirements ( added bold text to it 😞

 

I have to check each unique "score" for each testcode ( from the excel)   in the dataset columns Score1 - Score120 , If there is a hit, I need to take corresponding "Final score" column ( from 120 columns in dataset) and lookup that "Final score" against the excel for the same testcode and score , and find out what range of upperlimit and lowerlimit  it falls in, ( If it doesn't fall in any of the range in excel sheet for that test and score, we create a flag called error_Final Score(_i)  = 1, also we create a counter effor_flag, ) , If I have the hit, I need to take the "Proficiency" value from the excel and check that value in dataset "Prof_Level" column, we check the proficiency  from excel in Prof_Level only when that "score" is present in the Prof_Level , else we move to next score from excel, if the "score" is there in Prof_Level then we check Proficiency value from excel next to the matched score in Prof_level and if the value doesn't match we generate error on Prof_Level ( Error_Prof_Level  =1 ) and increment effor_flag+1, if we do not have the "score" from excel that we are checking in this iteration not there in Prof_Level at all , we do not do anything, just move to the next "score" from excel for that test and repeat. 

If we have another mismatch like this for that record , we generate error on Prof_Level1 and increment error_flag+1;

 

Note - Because there will be max two proficiency in "Prof_level" column in DS , I created one more column Prof_level1  to capture both errors if  two matches for a testcode fails.

 

Example:

From excel sheet, we take Score = 1 for testcode 41 and try to find it in the dataset column  Score1-Score120 , We have "Score1" in dataset =1, so thats a hit, Now we take corresponding Final Score from dataset, in this case "Final_score1" = 74, Now we take 74 and check in excel sheet against testcode 41 and score=1 and see what range it falls in ( If it doesn't fall in any of the range in excel sheet for that test and score, we create a flag called error_Final Score(_i)  = 1, also we create a counter say effor_flag, ) , now in our data It falls in row 1 in excel sheet, Now We take value  "Low" from there and check "PROF_Level" in dataset if it has the "Low" next to "1".  and we have Low next to "1" so  we are good, else If we have "score" from excel sheet in "PROF_Level" in dataset but the "proficiency from excel doesn't match with the string next to the score then we generate error, and increment effor_flag+1, if we do not have the "score" from excel that we are checking in this iteration not there in Prof_Level at all , we do not do anything, just move to the next "score" from excel and repeat. 

 

 

 

data ds;
infile datalines truncover dlm='|';
input Testcode score1 score2 score120 Final_score1 Final_score2 Final_score120 Prof_level ~ $20. Prof_level1 ~ $20. ;
datalines;
41|1|4.5|23.4|74|230|555|1 Low 2.2 Medium|1 Low 2.2 Medium
41|.|5.6|.|450|130|250|2.2 Medium 2 High|2.2 Medium 2 High
51|6.6|2.1|23.4|450|130|555|2.1 Medium 1 Low|2.1 Medium 1 Low
51|6.6|2.9|1|450|130|175|1 Medium 1 Low|2.1 Medium 1 Low
51|1|2|2.1|175|130|175|1 Medium 2 Low|1 Medium 2 Low
51|1|2|2.1|175|130|175|1 High 2.1 Low|1 High 2.1 Low
;

data Excel;
infile datalines truncover dlm='|';
input Testcode Proficiency :$10. MatchScore Lowerlimit Upperlimit $20.;
datalines;
41|Low|1|50|100
41|Medium|1|101|150
41|High|1|151|200
41|Low|2|75|125
41|Medium|2|126|175
41|High|2|176|200
51|Low|1|50|100
51|Medium|1|101|150
51|High|1|151|200
51|Low|2.1|75|125
51|Medium|2.1|126|175
51|High|2.1|176|200
;

 

 

Example below should not flag an error as there is no match of the score ( 2.1 ) in the "Prof_Level" column (1 & 2.2 ).

data ds;
infile datalines truncover dlm='|';
input Testcode score1 score2 score120 Final_score1 Final_score2 Final_score120 Prof_level ~ $20. Prof_level ~ $20. ;
datalines;
51|6.6|2.1|23.4|450|130|555|1 Low 2.2 Medium|1 Low 2.2 Medium
;

data Excel;
infile datalines truncover dlm='|';
input Testcode Proficiency :$10. MatchScore Lowerlimit Upperlimit $20.;
datalines;
51|Low|2.1|75|125
51|Medium|2.1|126|175
51|High|2.1|176|200
;

 

 

Desired output:

Desired_Result.JPG

 

It would be great if you can please help on this. Appreciate all your help, have a nice day.

 

Thanks

Manoj

MKS2204
Obsidian | Level 7

Hi, Request everyone to please have a look at the problem, Appreciate your help.

Patrick
Opal | Level 21

@MKS2204 wrote:

Hi, Request everyone to please have a look at the problem, Appreciate your help.


Just to explain to you why I'm not answering: There is just too much in your question for me to get my head around within reasonable time. You get normally more answers quicker if you break a problem apart, then try to first solve the steps yourself and ask targeted questions for where you get stuck.

It's also normally a good idea to not only post sample data and desired output but to also post your code (fully working or not) and explain where you got stuck. Posting your own code also lets us better understand on what level you are so we can provide answers which you can understand.

andreas_lds
Jade | Level 19

@MKS2204 wrote:

Hi, Request everyone to please have a look at the problem, Appreciate your help.


Fully agree to @Patricks comment. The problem is to big to be worked on in my spare time.

MKS2204
Obsidian | Level 7

@andreas_lds @Patrick  I understand what you guys are saying, and I understand it,  I tried myself a lot, was trying very lengthy methods, of transposing and then joins etc but could not join horizontal data in DS to vertical scores in Excel. I never used hash which seems to be the solution here . I have a deliverable based on this requirement so I got desperate to get the solution, and wrote the whole big requirement, which I know doesn't seem right.

 

I know everyone has time constraints, But thank you for taking time out, I got the lead , working on this, If I get stuck somewhere I will ask for your opinion, if you find time, It will be highly appreciated.

 

Have a Good One.

Manoj

Reeza
Super User
Hash is a solution that someone posted but not necessarily required. Your format is part of the issue for sure, I would recommend switching your first file to a long format rather than wide format and trying a join at that point.
MKS2204
Obsidian | Level 7

Thank you All.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 10 replies
  • 959 views
  • 8 likes
  • 4 in conversation