SAS Programming

twenty7 · Posted 01-23-2021 09:49 AM

In the data I have, exists the following -

Table 1

ID	Rule_1	Rule_2	Rule_3	Rule_4
0001	PCODE_27	RCODE_1	RCODE_2	OCODE_1
0002	RCODE_2

I have the task of validating that the codes being returned are correct and this had led me to produce data which looks like this;

Table 2a

ID	PCODE_27	RCODE_1	RCODE_2	OCODE_1	RCODE_21
0001	1	1	1	0	0
0002	0	0	1	0	0

'1' would indicate that rule was correct.

I would like to create an array which would be populated for each record but only where the value is 1. My expectation is that it would end up looking something like this

Table 2b

ID	EXP_CODE_1	EXP_CODE_2	EXP_CODE_3	EXP_CODE_4	EXP_CODE_5
0001	PCODE_27	RCODE_1	RCODE_2
0002	RCODE_2

This would then allow me to compare tables 1 and 2b directly

I would like your advice as to whether I am correct in my approach and if so how would I go about creating the array

Regards,

Tom · Posted 01-25-2021 02:32 PM

Why not just create a dataset that has the rule and the result in the same observation?

data want;
   merge one two;
   by id ;
   array rules rule_: ;
   do index=1 to dim(rules);
     RULE = rules[index] ;
     if rule ne ' ' then do;
       RESULT=input(strip(vvaluex(rule)),1.);
       output;
     end;
  end;
  keep id RULE RESULT;
run;

View solution in original post

PaigeMiller · Posted 01-23-2021 10:57 AM

Why are there 5 columns after ID in tables 2a and 2b, but only four different codes shown in table 1?

Do we know the complete list of codes that can appear? Or do we have to figure that out from the data?

--
Paige Miller

twenty7 · Posted 01-23-2021 12:24 PM

there can be unto 99 different codes that can appear and we do know the full list.

the reason for the different columns was to highlight there could be rules which have not triggered yet in the data available

Tom · Posted 01-23-2021 01:00 PM

Data as text is good, but it is easier to reply if you post your data as datasets.

data one ;
  input (id Rule_1-Rule_4) ($);
cards;
0001 PCODE_27 RCODE_1 RCODE_2 OCODE_1
0002 RCODE_2  .       .       .
;

data two;
  input ID $ PCODE_27 RCODE_1 RCODE_2 OCODE_1 RCODE_21;
cards;
0001 1 1 1 0 0
0002 0 0 1 0 0
;

You should be able to test directly by merging the two datasets.

data _null_ ;
   merge one two;
   by id ;
   array rules rule_: ;
   do index=1 to dim(rules);
     if rules[index] ne ' ' then do;
       if strip(vvaluex(rules[index])) ne '1' then put 'Rule not satisfied. ' id= rules[index]= ;
     end;
   end;
run;

Rule not satisfied. id=0001 Rule_4=OCODE_1

twenty7 · Posted 01-25-2021 01:54 PM

Thank you @Tom

I did make a slight amendment to output the result into a new column

data _null_ ;
   merge one two;
   by id ;
   array rules rule_: ;
   do index=1 to dim(rules);
     if rules[index] ne ' ' then do;
       if rules_match = "" and strip(vvaluex(rules[index])) ne 1 then rules_match = "differences"
     end;
   end;
run;

This would allow the user to easily isolate records where there is not a 100% match

How would the reverse of this work where I need to identify records where there is a 1 but the code is not present in the array that is defined in the code above?

Tom · Posted 01-25-2021 02:32 PM

Why not just create a dataset that has the rule and the result in the same observation?

data want;
   merge one two;
   by id ;
   array rules rule_: ;
   do index=1 to dim(rules);
     RULE = rules[index] ;
     if rule ne ' ' then do;
       RESULT=input(strip(vvaluex(rule)),1.);
       output;
     end;
  end;
  keep id RULE RESULT;
run;

SAS Programming

Arrays?

Re: Arrays?

Re: Arrays?

Re: Arrays?

Re: Arrays?

Re: Arrays?

Re: Arrays?

Follow Us

What is...

SAS Programming

Arrays?

Re: Arrays?

Re: Arrays?

Re: Arrays?

Re: Arrays?

Re: Arrays?

Re: Arrays?

Join us for our biggest event of the year!

SAS Training: Just a Click Away

Follow Us

What is...