Hi All,
I am facing a programming logic roadblock. Tried a lot of ways but not successful.
I have a SAS data set as below :
Testcode | score1 | score2 | score120 | Final_score1 | Final_score2 | Final_score120 | Prof_level |
41 | 1 | 4.5 | 2.3.4 | 74 | 230 | 555 | 1 Low 2.2 Medium |
41 | . | 5.6 | . | 450 | 130 | 250 | 2.2 Medium 2 High |
51 | 6.6 | 2.1 | 2.3.4 | 450 | 130 | 555 | 2.1 Medium 1 Low |
And i also have an excel sheet containing below data :
Testcode | Proficiency | Score | Lowerlimit | Upperlimit |
41 | Low | 1 | 50 | 100 |
41 | Medium | 1 | 101 | 150 |
41 | High | 1 | 151 | 200 |
41 | Low | 2 | 75 | 125 |
41 | Medium | 2 | 126 | 175 |
41 | High | 2 | 176 | 200 |
51 | Low | 1 | 50 | 100 |
51 | Medium | 1 | 101 | 150 |
51 | High | 1 | 151 | 200 |
51 | Low | 2.1 | 75 | 125 |
51 | Medium | 2.1 | 126 | 175 |
51 | High | 2.1 | 176 | 200 |
In the dataset there are 120 columns named "score1 - score120" also Final score1-finalscore120.
Now I have to check each unique "score" ( from the excel) for the testcode in the dataset columns Score1 - Score120 , If there is a hit, I need to take corresponding "Finalscore" column ( from 120 columns in dataset) and lookup that "Finalscore" against the excel for the same testcode and score , and find out what range of upperlimit and lowerlimit it falls in., Once I have the hit, I need to take the "Proficiency" value from the excel and check that value in dataset "Prof_Level" column , if that value is present in that string along with the Score value in process.
Example :
From excel sheet, we take Score = 1 for testcode 41 and try to find it in the dataset column Score1-Score120 , We have "Score1" in dataset =1, so thats a hit, Now we take corresponding Final Score from dataset, in this case "Final_score1" = 74, Now we take 74 and check in excel sheet against testcode 41 and score=1 and see what range it falls in , It falls in row 1 in excel sheet, Now We take value "Low" from there and check "PROF_Level" in dataset if it has the "Low" next to "1". and we have Low next to "1" so we are good, else we flag that record as error.
I know its a lot, But I am not able to get the solution to it, and I am pretty sure Its not a straight forward joins or lookup.
I am trying and would have tried more myself, I have to complete something on very urgent bases.
Please let me know if you need more information.
Appreciate all your help.
Thanks
Manoj
@MKS2204 wrote:
....
Now I have to check each unique "score" ( from the excel) for the testcode in the dataset columns Score1 - Score120 , If there is a hit, I need to take corresponding "Finalscore" column ( from 120 columns in dataset) and lookup that "Finalscore" against the excel for the same testcode and score , and find out what range of upperlimit and lowerlimit it falls in., Once I have the hit, I need to take the "Proficiency" value from the excel and check that value in dataset "Prof_Level" column , if that value is present in that string along with the Score value in process.
Example :
From excel sheet, we take Score = 1 for testcode 41 and try to find it in the dataset column Score1-Score120 , We have "Score1" in dataset =1, so thats a hit, Now we take corresponding Final Score from dataset, in this case "Final_score1" = 74, Now we take 74 and check in excel sheet against testcode 41 and score=1 and see what range it falls in , It falls in row 1 in excel sheet, Now We take value "Low" from there and check "PROF_Level" in dataset if it has the "Low" next to "1". and we have Low next to "1" so we are good, else we flag that record as error.
Below a solution for how I understand your requirement.
data ds;
infile datalines truncover dlm='|';
input Testcode score1 score2 score120 Final_score1 Final_score2 Final_score120 Prof_level $20. ;
datalines;
41|1|4.5|23.4|74|230|555|1 Low 2.2 Medium
41|.|5.6|.|450|130|250|2.2 Medium 2 High
51|6.6|2.1|23.4|450|130|555|2.1 Medium 1 Low
;
data Excel;
infile datalines truncover dlm='|';
input Testcode Proficiency :$10. MatchScore Lowerlimit Upperlimit $20.;
datalines;
41|Low|1|50|100
41|Medium|1|101|150
41|High|1|151|200
41|Low|2|75|125
41|Medium|2|126|175
41|High|2|176|200
51|Low|1|50|100
51|Medium|1|101|150
51|High|1|151|200
51|Low|2.1|75|125
51|Medium|2.1|126|175
51|High|2.1|176|200
;
data want(drop=_:);
if _n_=1 then
do;
if 0 then set ds excel;
dcl hash h1(dataset:'excel', multidata:'y');
h1.defineKey('Testcode','MatchScore');
h1.defineData(all:'y');
h1.defineDone();
end;
call missing(of _all_);
set ds;
array scores {*} score:;
array Final_scores {*} Final_score:;
ScoreMatch_flg=0;
do _i=1 to dim(scores);
do while(h1.do_over(key:Testcode,key:scores[_i]) eq 0);
if lowerlimit<=Final_scores[_i]<=upperlimit then
do;
ScoreMatch_flg= (scores[_i]=input(scan(Prof_level,findw(Prof_level, strip(Proficiency), ' ', 'EI')-1,' '),?? best32.));
/* once we've found a match don't test any further scores in the observation */
_i=dim(scores)+1;
leave;
end;
end;
end;
run;
proc print;
run;
Based on above inputs, please show what you want as output, exactly. Makes sure the input and output lines up based on your sample data. If you can provide your data as a full data step you're more likely to get a response as well, rather than as embedded in an Excel file or as HTML text (it carries weird formatting when you copy/paste into SAS).
Here are instructions on how to provide sample data as a data step:
https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat...
Good Luck.
@MKS2204 wrote:
Hi All,
I am facing a programming logic roadblock. Tried a lot of ways but not successful.
I have a SAS data set as below :
Testcode score1 score2 score120 Final_score1 Final_score2 Final_score120 Prof_level 41 1 4.5 2.3.4 74 230 555 1 Low 2.2 Medium 41 . 5.6 . 450 130 250 2.2 Medium 2 High 51 6.6 2.1 2.3.4 450 130 555 2.1 Medium 1 Low
And i also have an excel sheet containing below data :
Testcode Proficiency Score Lowerlimit Upperlimit 41 Low 1 50 100 41 Medium 1 101 150 41 High 1 151 200 41 Low 2 75 125 41 Medium 2 126 175 41 High 2 176 200 51 Low 1 50 100 51 Medium 1 101 150 51 High 1 151 200 51 Low 2.1 75 125 51 Medium 2.1 126 175 51 High 2.1 176 200
In the dataset there are 120 columns named "score1 - score120" also Final score1-finalscore120.
Now I have to check each unique "score" ( from the excel) for the testcode in the dataset columns Score1 - Score120 , If there is a hit, I need to take corresponding "Finalscore" column ( from 120 columns in dataset) and lookup that "Finalscore" against the excel for the same testcode and score , and find out what range of upperlimit and lowerlimit it falls in., Once I have the hit, I need to take the "Proficiency" value from the excel and check that value in dataset "Prof_Level" column , if that value is present in that string along with the Score value in process.
Example :
From excel sheet, we take Score = 1 for testcode 41 and try to find it in the dataset column Score1-Score120 , We have "Score1" in dataset =1, so thats a hit, Now we take corresponding Final Score from dataset, in this case "Final_score1" = 74, Now we take 74 and check in excel sheet against testcode 41 and score=1 and see what range it falls in , It falls in row 1 in excel sheet, Now We take value "Low" from there and check "PROF_Level" in dataset if it has the "Low" next to "1". and we have Low next to "1" so we are good, else we flag that record as error.
I know its a lot, But I am not able to get the solution to it, and I am pretty sure Its not a straight forward joins or lookup.
I am trying and would have tried more myself, I have to complete something on very urgent bases.
Please let me know if you need more information.
Appreciate all your help.
Thanks
Manoj
@MKS2204 wrote:
....
Now I have to check each unique "score" ( from the excel) for the testcode in the dataset columns Score1 - Score120 , If there is a hit, I need to take corresponding "Finalscore" column ( from 120 columns in dataset) and lookup that "Finalscore" against the excel for the same testcode and score , and find out what range of upperlimit and lowerlimit it falls in., Once I have the hit, I need to take the "Proficiency" value from the excel and check that value in dataset "Prof_Level" column , if that value is present in that string along with the Score value in process.
Example :
From excel sheet, we take Score = 1 for testcode 41 and try to find it in the dataset column Score1-Score120 , We have "Score1" in dataset =1, so thats a hit, Now we take corresponding Final Score from dataset, in this case "Final_score1" = 74, Now we take 74 and check in excel sheet against testcode 41 and score=1 and see what range it falls in , It falls in row 1 in excel sheet, Now We take value "Low" from there and check "PROF_Level" in dataset if it has the "Low" next to "1". and we have Low next to "1" so we are good, else we flag that record as error.
Below a solution for how I understand your requirement.
data ds;
infile datalines truncover dlm='|';
input Testcode score1 score2 score120 Final_score1 Final_score2 Final_score120 Prof_level $20. ;
datalines;
41|1|4.5|23.4|74|230|555|1 Low 2.2 Medium
41|.|5.6|.|450|130|250|2.2 Medium 2 High
51|6.6|2.1|23.4|450|130|555|2.1 Medium 1 Low
;
data Excel;
infile datalines truncover dlm='|';
input Testcode Proficiency :$10. MatchScore Lowerlimit Upperlimit $20.;
datalines;
41|Low|1|50|100
41|Medium|1|101|150
41|High|1|151|200
41|Low|2|75|125
41|Medium|2|126|175
41|High|2|176|200
51|Low|1|50|100
51|Medium|1|101|150
51|High|1|151|200
51|Low|2.1|75|125
51|Medium|2.1|126|175
51|High|2.1|176|200
;
data want(drop=_:);
if _n_=1 then
do;
if 0 then set ds excel;
dcl hash h1(dataset:'excel', multidata:'y');
h1.defineKey('Testcode','MatchScore');
h1.defineData(all:'y');
h1.defineDone();
end;
call missing(of _all_);
set ds;
array scores {*} score:;
array Final_scores {*} Final_score:;
ScoreMatch_flg=0;
do _i=1 to dim(scores);
do while(h1.do_over(key:Testcode,key:scores[_i]) eq 0);
if lowerlimit<=Final_scores[_i]<=upperlimit then
do;
ScoreMatch_flg= (scores[_i]=input(scan(Prof_level,findw(Prof_level, strip(Proficiency), ' ', 'EI')-1,' '),?? best32.));
/* once we've found a match don't test any further scores in the observation */
_i=dim(scores)+1;
leave;
end;
end;
end;
run;
proc print;
run;
@Reeza Thank you for updating the post.
@Patrick Thanks Patrick, For taking time to look at it. I am sorry if i was not clear and complete with my requirements, As Reeza mentioned I should have put all information ( I apologize as I am new to posting questions, I will learn) 🙂
I have added few more rows to input dataset, and also attached the desired output.
Complete Requirements ( added bold text to it 😞
I have to check each unique "score" for each testcode ( from the excel) in the dataset columns Score1 - Score120 , If there is a hit, I need to take corresponding "Final score" column ( from 120 columns in dataset) and lookup that "Final score" against the excel for the same testcode and score , and find out what range of upperlimit and lowerlimit it falls in, ( If it doesn't fall in any of the range in excel sheet for that test and score, we create a flag called error_Final Score(_i) = 1, also we create a counter effor_flag, ) , If I have the hit, I need to take the "Proficiency" value from the excel and check that value in dataset "Prof_Level" column, we check the proficiency from excel in Prof_Level only when that "score" is present in the Prof_Level , else we move to next score from excel, if the "score" is there in Prof_Level then we check Proficiency value from excel next to the matched score in Prof_level and if the value doesn't match we generate error on Prof_Level ( Error_Prof_Level =1 ) and increment effor_flag+1, if we do not have the "score" from excel that we are checking in this iteration not there in Prof_Level at all , we do not do anything, just move to the next "score" from excel for that test and repeat.
If we have another mismatch like this for that record , we generate error on Prof_Level1 and increment error_flag+1;
Note - Because there will be max two proficiency in "Prof_level" column in DS , I created one more column Prof_level1 to capture both errors if two matches for a testcode fails.
Example:
From excel sheet, we take Score = 1 for testcode 41 and try to find it in the dataset column Score1-Score120 , We have "Score1" in dataset =1, so thats a hit, Now we take corresponding Final Score from dataset, in this case "Final_score1" = 74, Now we take 74 and check in excel sheet against testcode 41 and score=1 and see what range it falls in ( If it doesn't fall in any of the range in excel sheet for that test and score, we create a flag called error_Final Score(_i) = 1, also we create a counter say effor_flag, ) , now in our data It falls in row 1 in excel sheet, Now We take value "Low" from there and check "PROF_Level" in dataset if it has the "Low" next to "1". and we have Low next to "1" so we are good, else If we have "score" from excel sheet in "PROF_Level" in dataset but the "proficiency from excel doesn't match with the string next to the score then we generate error, and increment effor_flag+1, if we do not have the "score" from excel that we are checking in this iteration not there in Prof_Level at all , we do not do anything, just move to the next "score" from excel and repeat.
data ds;
infile datalines truncover dlm='|';
input Testcode score1 score2 score120 Final_score1 Final_score2 Final_score120 Prof_level ~ $20. Prof_level1 ~ $20. ;
datalines;
41|1|4.5|23.4|74|230|555|1 Low 2.2 Medium|1 Low 2.2 Medium
41|.|5.6|.|450|130|250|2.2 Medium 2 High|2.2 Medium 2 High
51|6.6|2.1|23.4|450|130|555|2.1 Medium 1 Low|2.1 Medium 1 Low
51|6.6|2.9|1|450|130|175|1 Medium 1 Low|2.1 Medium 1 Low
51|1|2|2.1|175|130|175|1 Medium 2 Low|1 Medium 2 Low
51|1|2|2.1|175|130|175|1 High 2.1 Low|1 High 2.1 Low
;
data Excel;
infile datalines truncover dlm='|';
input Testcode Proficiency :$10. MatchScore Lowerlimit Upperlimit $20.;
datalines;
41|Low|1|50|100
41|Medium|1|101|150
41|High|1|151|200
41|Low|2|75|125
41|Medium|2|126|175
41|High|2|176|200
51|Low|1|50|100
51|Medium|1|101|150
51|High|1|151|200
51|Low|2.1|75|125
51|Medium|2.1|126|175
51|High|2.1|176|200
;
Example below should not flag an error as there is no match of the score ( 2.1 ) in the "Prof_Level" column (1 & 2.2 ).
data ds;
infile datalines truncover dlm='|';
input Testcode score1 score2 score120 Final_score1 Final_score2 Final_score120 Prof_level ~ $20. Prof_level ~ $20. ;
datalines;
51|6.6|2.1|23.4|450|130|555|1 Low 2.2 Medium|1 Low 2.2 Medium
;
data Excel;
infile datalines truncover dlm='|';
input Testcode Proficiency :$10. MatchScore Lowerlimit Upperlimit $20.;
datalines;
51|Low|2.1|75|125
51|Medium|2.1|126|175
51|High|2.1|176|200
;
Desired output:
It would be great if you can please help on this. Appreciate all your help, have a nice day.
Thanks
Manoj
Hi, Request everyone to please have a look at the problem, Appreciate your help.
@MKS2204 wrote:
Hi, Request everyone to please have a look at the problem, Appreciate your help.
Just to explain to you why I'm not answering: There is just too much in your question for me to get my head around within reasonable time. You get normally more answers quicker if you break a problem apart, then try to first solve the steps yourself and ask targeted questions for where you get stuck.
It's also normally a good idea to not only post sample data and desired output but to also post your code (fully working or not) and explain where you got stuck. Posting your own code also lets us better understand on what level you are so we can provide answers which you can understand.
@andreas_lds @Patrick I understand what you guys are saying, and I understand it, I tried myself a lot, was trying very lengthy methods, of transposing and then joins etc but could not join horizontal data in DS to vertical scores in Excel. I never used hash which seems to be the solution here . I have a deliverable based on this requirement so I got desperate to get the solution, and wrote the whole big requirement, which I know doesn't seem right.
I know everyone has time constraints, But thank you for taking time out, I got the lead , working on this, If I get stuck somewhere I will ask for your opinion, if you find time, It will be highly appreciated.
Have a Good One.
Manoj
Thank you All.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.