Solved: Inter-observational checks of values for each ID in data set

nemez · Posted 08-07-2024 11:56 AM

Hi everybody,

I have a data set like the following one (which is created from three different data sets of similar structure).

data have;
input ID $ Product $ Acc $ Type $ Value1 Value2 Value3;
datalines;
1 C 3 Red 3 3 1
1 A 1 Blue 2 1 0
2 D 2 Orange 0 2 .
2 B 4 Yellow 4 4 2
3 B 3 Blue 5 3 .
3 C 5 Orange 1 1 .
;

Now, I want to perform some "intra-observation" checks to detect errors and also sometimes conditionally create new rows like in this example:

data want;
set have;
by ID;
if Value1 ^= Value2 then Err = 'E1';
output;
if Type in ('Blue','Orange') and find(Product,'E') = 0 then do;
Product = 'E';
Acc = '';
Type = '';
Value1 = .;
Value2 = .;
Value3 = .;
Err = 'E2';
if last.ID then output;
end;
run;

Besides, I also want to perform "inter-observation" checks for each ID, f. ex. whether for certain combinations of Product and Acc some condition is fulfilled, f. ex. (fake code):

do over ID;
if for comb('B','4') Value1 > 0 and for comb('D','2') Value1 = 0 then do;
...
Err = 'E3';
if last.ID then output;
end;
end;

(note that I am referring to obs 4 and 3 in the "have" data set).

I have not been able to do this within a data step and I am starting to think that this operation might only possible with "proc iml" or some other way. However, I have no experience in working with SAS/IML and thus, any suggestions are appreciated.

Ksharp · Posted 08-10-2024 02:46 AM

Assuming I understood what you mean.

data have;
input ID $ Product $ Acc $ Type $ Value1 Value2 Value3;
datalines;
1 C 3 Red 3 3 1
1 A 1 Blue 2 1 0
2 D 2 Orange 0 2 .
2 B 4 Yellow 4 4 2
3 B 3 Blue 5 3 .
3 C 5 Orange 1 1 .
;

data want;
do until(last.id);
 set have;
 by id;
 Err='  ';
 if Value1 ^= Value2 then Err = 'E1';

 if Product='B' and Acc='4' and value1>0 then flag31=1;
 if Product='D' and Acc='2' and value1=0 then flag32=1;
 if last.id and  flag31=1 and flag32=1 then Err='E3';
 output;

 if last.id and  Type in ('Blue','Orange') and find(Product,'E') = 0 then do;
   Product='E';call missing(of Acc--Value3);Err='E2';output;
 end;
end; 
drop  flag31 flag32;
run;

View solution in original post

mkeintz · Posted 08-07-2024 10:57 PM

Your description of your objective is very sparse. But you can do "inter" obs comparsons by reading through a single id group, in a DO UNTIL LAST.ID do-group. Something like:

data have;
input ID $ Product $ Acc $ Type $ Value1 Value2 Value3;
datalines;
1 C 3 Red 3 3 1
1 A 1 Blue 2 1 0
2 D 2 Orange 0 2 .
2 B 4 Yellow 4 4 2
3 B 3 Blue 5 3 .
3 C 5 Orange 1 1 .
;

data want (keep=ID ERR);
  do until (last.id);
    set have;
    by id;
    if catx(',',product,acc)='B,4' and value1>0 then _condition_a1=1; else 
    if catx(',',product,acc)='D,2' and value1=0 then _condition_a2=1; 

    length type_list $400;
    type_list=catx(',',type_list,type);

  end;
  length ERR $10;
  if n(of _condition_a:)=2 then do;
    err='E3';
    output;
  end;

  if findw(type_list,'Blue')>0 and findw(type_list,'Orange')>0 then do;
    err='Type 2 error';
    output;
  end;
run;

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

Ksharp · Posted 08-07-2024 10:59 PM

You could try PROC SQL.

And better to post your desired output to better explain your question.

data have;
input ID $ Product $ Acc $ Type $ Value1 Value2 Value3;
datalines;
1 C 3 Red 3 3 1
1 A 1 Blue 2 1 0
2 D 2 Orange 0 2 .
2 B 4 Yellow 4 4 2
3 B 3 Blue 5 3 .
3 C 5 Orange 1 1 .
;

proc sql;
select id, 

case
when sum(type in ('Blue','Orange')) and sum(Product='E')  then 'E'
else ' '
end as Product

 from have
  group by id;
quit;

ChrisNZ · Posted 08-08-2024 04:14 AM

Another way:

data HAVE(index=(A=(ID PRODUCT ACC)));
  input ID $ PRODUCT $ ACC $ TYPE $ VALUE1 VALUE2 VALUE3;
datalines;
1 C 3 Red 3 3 1
1 A 1 Blue 2 1 0
2 D 2 Orange 0 2 .
2 B 4 Yellow 4 4 2
3 B 3 Blue 5 3 .
3 C 5 Orange 1 1 .
run;

data WANT;
  set HAVE;
  by ID;
  if last.ID then do;
     * save values of PRODUCT and ACC here if needed ;
 
    * if for comb('B','4') VALUE1 > 0  ;
    PRODUCT='B'; ACC='4'; 
    set HAVE key=A;
    _CONDITION1= (^_IORC_ & VALUE1 > 0); * keys found and value > 0 ; 
    
    * and for comb('D','2') VALUE1 = 0 ;
    PRODUCT='D'; ACC='2'; 
    set HAVE key=A;
    _CONDITION2= (^_IORC_ & VALUE1 = 0); * keys found and value = 0 ; 
    
    if _CONDITION1 & _CONDITION2 then FLAG='Y';
    * restore values of PRODUCT and ACC here if needed ;
    
    output;
    _IORC_=0; _ERROR_=0;
    putlog ID= FLAG=;
  end;
  drop _:;
run;

ID=1 FLAG=

ID=2 FLAG=Y

ID=3 FLAG=

High-Performance SAS Coding - Third Edition

nemez · Posted 08-09-2024 05:21 AM

Hello ChrisNZ,

thank you, your approach looks very promising. How would I have to change your piece of code to put the flag in the obs where condition 2 is checked (here obs=3, but in the real data set it would not have to be the row where first.ID = 1)?

nemez · Posted 08-09-2024 05:41 AM

Hello,

Thank you for your suggestions. Since the desired output was requested:

ID Product Acc Type Value1-Value3 Err
1 C 3 Red 3 3 1
1 A 1 Blue 2 1 0 E1
1 E   . . . E2
2 D 2 Orange 0 2 . E1
2 B 4 Yellow 4 4 2 E3
3 B 3 Blue 5 3 . E1
3 C 5 Orange 1 1 .
3 E  . . . E2

nemez · Posted 08-09-2024 05:50 AM

data out;
input ID $ Product $ Acc $ Type $ Value1 Value2 Value3 Err $;
datalines;
1 C 3 Red 3 3 1 .
1 A 1 Blue 2 1 0 E1
1 E . . . . . E2
2 D 2 Orange 0 2 . E1
2 B 4 Yellow 4 4 2 E3
3 B 3 Blue 5 3 . E1
3 C 5 Orange 1 1 . .
3 E . . . . . E2
;

To be precise.

ChrisNZ · Posted 08-10-2024 02:23 AM

The logic you want is still a bit unclear. This matches your output. Please adapt as needed.

data WANT;
  set HAVE point=_N_;
  by ID;
  if VALUE1 ^= VALUE2 then ERR = 'E1';
  output;
  if TYPE in ('Blue','Orange') & ^find(PRODUCT,'E') then do;
    call missing(ACC, TYPE, VALUE1, VALUE2, VALUE3);
    PRODUCT = 'E';
    ERR = 'E2';
    if last.ID then output;
  end;
  if PRODUCT='D' & ACC='2' then do;
    _PRODUCT=PRODUCT; _ACC=ACC;
    PRODUCT='B'; ACC='4'; 
    set HAVE key=A;
    if ^_IORC_ & VALUE1 > 0 then do; 
      ERR = 'E3';
      PRODUCT=_PRODUCT; ACC=_ACC;
      output;
      _IORC_=0; _ERROR_=0;
    end; 
  end;
  drop _:;
run;
proc print;run;

Obs	ID	Product	Acc	Type	Value1	Value2	Value3	Err
1	1	C	3	Red	3	3	1
2	1	A	1	Blue	2	1	0	E1
3	1	E			.	.	.	E2
4	2	D	2	Orange	0	2	.	E1
5	2	B	4	Yellow	4	4	2	E3
6	3	B	3	Blue	5	3	.	E1
7	3	C	5	Orange	1	1	.
8	3	E			.	.	.	E2

High-Performance SAS Coding - Third Edition

nemez · Posted 08-12-2024 08:36 AM

Hello ChrisNZ,

Unfortunately, the point-option does not work for me.

I got an error message, stating:

ERROR: The POINT= option is incompatible with the BY statement.

Ksharp · Posted 08-10-2024 02:46 AM

Assuming I understood what you mean.

data have;
input ID $ Product $ Acc $ Type $ Value1 Value2 Value3;
datalines;
1 C 3 Red 3 3 1
1 A 1 Blue 2 1 0
2 D 2 Orange 0 2 .
2 B 4 Yellow 4 4 2
3 B 3 Blue 5 3 .
3 C 5 Orange 1 1 .
;

data want;
do until(last.id);
 set have;
 by id;
 Err='  ';
 if Value1 ^= Value2 then Err = 'E1';

 if Product='B' and Acc='4' and value1>0 then flag31=1;
 if Product='D' and Acc='2' and value1=0 then flag32=1;
 if last.id and  flag31=1 and flag32=1 then Err='E3';
 output;

 if last.id and  Type in ('Blue','Orange') and find(Product,'E') = 0 then do;
   Product='E';call missing(of Acc--Value3);Err='E2';output;
 end;
end; 
drop  flag31 flag32;
run;

nemez · Posted 08-12-2024 08:44 AM

Hello Ksharp,

thank you for helpful suggestion! However, I need to have more flexibility in the code (regarding the "if last.ID ..." parts) and have found a solution that works fine for me with two do-until-loops, see below.

data want(drop=flag:);
do until (last.ID);
set have;
by ID;
if find(Product,'E') = 0 then flag2 = 1;
if Product = 'B' and Acc = '4' and Value1 > 0 then flag31 = 1;
if Product = 'D' and Acc = '2' and Value1 = 0 then flag32 = 1;
end;
do until (last.ID);
set out;
by ID;
output;
if Value1 ^= Value2 then Err = 'E1';
if Type in ('Blue','Orange') and flag2 = 1  then do;
	Product='E'; call missing(of Acc--Value3); Err='E2';
end;
if flag31 = 1 and flag32 = 1 then Err = 'E3';
end;
run;

@all: Thanks for your suggestions/help!

nemez · Posted 08-12-2024 09:11 AM

Just for the record: If using Ksharps code without the "last.ID"-condition regarding Err 'E3' it works fine.

I myself have discovered that the double do-until-loop (where I had a little mistake as I read in "out" instead of "have" in the second do-until-loop and therefore had to adjust some more lines) for some reason does not correctly detect Error 'E1' in one instance.

Inter-observational checks of values for each ID in data set

Re: Inter-observational checks of values for each ID in data set

Re: Inter-observational checks of values for each ID in data set

Re: Inter-observational checks of values for each ID in data set

Re: Inter-observational checks of values for each ID in data set

Re: Inter-observational checks of values for each ID in data set

Re: Inter-observational checks of values for each ID in data set

Re: Inter-observational checks of values for each ID in data set

Re: Inter-observational checks of values for each ID in data set

Re: Inter-observational checks of values for each ID in data set

Re: Inter-observational checks of values for each ID in data set

Re: Inter-observational checks of values for each ID in data set

Re: Inter-observational checks of values for each ID in data set

Registration is open

SAS Training: Just a Click Away