In the data I have, exists the following -
Table 1
ID | Rule_1 | Rule_2 | Rule_3 | Rule_4 |
0001 | PCODE_27 | RCODE_1 | RCODE_2 | OCODE_1 |
0002 | RCODE_2 |
I have the task of validating that the codes being returned are correct and this had led me to produce data which looks like this;
Table 2a
ID | PCODE_27 | RCODE_1 | RCODE_2 | OCODE_1 | RCODE_21 |
0001 | 1 | 1 | 1 | 0 | 0 |
0002 | 0 | 0 | 1 | 0 | 0 |
'1' would indicate that rule was correct.
I would like to create an array which would be populated for each record but only where the value is 1. My expectation is that it would end up looking something like this
Table 2b
ID | EXP_CODE_1 | EXP_CODE_2 | EXP_CODE_3 | EXP_CODE_4 | EXP_CODE_5 |
0001 | PCODE_27 | RCODE_1 | RCODE_2 | ||
0002 | RCODE_2 |
This would then allow me to compare tables 1 and 2b directly
I would like your advice as to whether I am correct in my approach and if so how would I go about creating the array
Regards,
Why not just create a dataset that has the rule and the result in the same observation?
data want;
merge one two;
by id ;
array rules rule_: ;
do index=1 to dim(rules);
RULE = rules[index] ;
if rule ne ' ' then do;
RESULT=input(strip(vvaluex(rule)),1.);
output;
end;
end;
keep id RULE RESULT;
run;
Why are there 5 columns after ID in tables 2a and 2b, but only four different codes shown in table 1?
Do we know the complete list of codes that can appear? Or do we have to figure that out from the data?
there can be unto 99 different codes that can appear and we do know the full list.
the reason for the different columns was to highlight there could be rules which have not triggered yet in the data available
Data as text is good, but it is easier to reply if you post your data as datasets.
data one ;
input (id Rule_1-Rule_4) ($);
cards;
0001 PCODE_27 RCODE_1 RCODE_2 OCODE_1
0002 RCODE_2 . . .
;
data two;
input ID $ PCODE_27 RCODE_1 RCODE_2 OCODE_1 RCODE_21;
cards;
0001 1 1 1 0 0
0002 0 0 1 0 0
;
You should be able to test directly by merging the two datasets.
data _null_ ;
merge one two;
by id ;
array rules rule_: ;
do index=1 to dim(rules);
if rules[index] ne ' ' then do;
if strip(vvaluex(rules[index])) ne '1' then put 'Rule not satisfied. ' id= rules[index]= ;
end;
end;
run;
Rule not satisfied. id=0001 Rule_4=OCODE_1
Thank you @Tom
I did make a slight amendment to output the result into a new column
data _null_ ;
merge one two;
by id ;
array rules rule_: ;
do index=1 to dim(rules);
if rules[index] ne ' ' then do;
if rules_match = "" and strip(vvaluex(rules[index])) ne 1 then rules_match = "differences"
end;
end;
run;
This would allow the user to easily isolate records where there is not a 100% match
How would the reverse of this work where I need to identify records where there is a 1 but the code is not present in the array that is defined in the code above?
Why not just create a dataset that has the rule and the result in the same observation?
data want;
merge one two;
by id ;
array rules rule_: ;
do index=1 to dim(rules);
RULE = rules[index] ;
if rule ne ' ' then do;
RESULT=input(strip(vvaluex(rule)),1.);
output;
end;
end;
keep id RULE RESULT;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.