Hi everybody,
I have a data set like the following one (which is created from three different data sets of similar structure).
data have;
input ID $ Product $ Acc $ Type $ Value1 Value2 Value3;
datalines;
1 C 3 Red 3 3 1
1 A 1 Blue 2 1 0
2 D 2 Orange 0 2 .
2 B 4 Yellow 4 4 2
3 B 3 Blue 5 3 .
3 C 5 Orange 1 1 .
;
Now, I want to perform some "intra-observation" checks to detect errors and also sometimes conditionally create new rows like in this example:
data want;
set have;
by ID;
if Value1 ^= Value2 then Err = 'E1';
output;
if Type in ('Blue','Orange') and find(Product,'E') = 0 then do;
Product = 'E';
Acc = '';
Type = '';
Value1 = .;
Value2 = .;
Value3 = .;
Err = 'E2';
if last.ID then output;
end;
run;
Besides, I also want to perform "inter-observation" checks for each ID, f. ex. whether for certain combinations of Product and Acc some condition is fulfilled, f. ex. (fake code):
do over ID;
if for comb('B','4') Value1 > 0 and for comb('D','2') Value1 = 0 then do;
...
Err = 'E3';
if last.ID then output;
end;
end;
(note that I am referring to obs 4 and 3 in the "have" data set).
I have not been able to do this within a data step and I am starting to think that this operation might only possible with "proc iml" or some other way. However, I have no experience in working with SAS/IML and thus, any suggestions are appreciated.
Assuming I understood what you mean.
data have; input ID $ Product $ Acc $ Type $ Value1 Value2 Value3; datalines; 1 C 3 Red 3 3 1 1 A 1 Blue 2 1 0 2 D 2 Orange 0 2 . 2 B 4 Yellow 4 4 2 3 B 3 Blue 5 3 . 3 C 5 Orange 1 1 . ; data want; do until(last.id); set have; by id; Err=' '; if Value1 ^= Value2 then Err = 'E1'; if Product='B' and Acc='4' and value1>0 then flag31=1; if Product='D' and Acc='2' and value1=0 then flag32=1; if last.id and flag31=1 and flag32=1 then Err='E3'; output; if last.id and Type in ('Blue','Orange') and find(Product,'E') = 0 then do; Product='E';call missing(of Acc--Value3);Err='E2';output; end; end; drop flag31 flag32; run;
Your description of your objective is very sparse. But you can do "inter" obs comparsons by reading through a single id group, in a DO UNTIL LAST.ID do-group. Something like:
data have;
input ID $ Product $ Acc $ Type $ Value1 Value2 Value3;
datalines;
1 C 3 Red 3 3 1
1 A 1 Blue 2 1 0
2 D 2 Orange 0 2 .
2 B 4 Yellow 4 4 2
3 B 3 Blue 5 3 .
3 C 5 Orange 1 1 .
;
data want (keep=ID ERR);
do until (last.id);
set have;
by id;
if catx(',',product,acc)='B,4' and value1>0 then _condition_a1=1; else
if catx(',',product,acc)='D,2' and value1=0 then _condition_a2=1;
length type_list $400;
type_list=catx(',',type_list,type);
end;
length ERR $10;
if n(of _condition_a:)=2 then do;
err='E3';
output;
end;
if findw(type_list,'Blue')>0 and findw(type_list,'Orange')>0 then do;
err='Type 2 error';
output;
end;
run;
You could try PROC SQL.
And better to post your desired output to better explain your question.
data have;
input ID $ Product $ Acc $ Type $ Value1 Value2 Value3;
datalines;
1 C 3 Red 3 3 1
1 A 1 Blue 2 1 0
2 D 2 Orange 0 2 .
2 B 4 Yellow 4 4 2
3 B 3 Blue 5 3 .
3 C 5 Orange 1 1 .
;
proc sql;
select id,
case
when sum(type in ('Blue','Orange')) and sum(Product='E') then 'E'
else ' '
end as Product
from have
group by id;
quit;
Another way:
data HAVE(index=(A=(ID PRODUCT ACC)));
input ID $ PRODUCT $ ACC $ TYPE $ VALUE1 VALUE2 VALUE3;
datalines;
1 C 3 Red 3 3 1
1 A 1 Blue 2 1 0
2 D 2 Orange 0 2 .
2 B 4 Yellow 4 4 2
3 B 3 Blue 5 3 .
3 C 5 Orange 1 1 .
run;
data WANT;
set HAVE;
by ID;
if last.ID then do;
* save values of PRODUCT and ACC here if needed ;
* if for comb('B','4') VALUE1 > 0 ;
PRODUCT='B'; ACC='4';
set HAVE key=A;
_CONDITION1= (^_IORC_ & VALUE1 > 0); * keys found and value > 0 ;
* and for comb('D','2') VALUE1 = 0 ;
PRODUCT='D'; ACC='2';
set HAVE key=A;
_CONDITION2= (^_IORC_ & VALUE1 = 0); * keys found and value = 0 ;
if _CONDITION1 & _CONDITION2 then FLAG='Y';
* restore values of PRODUCT and ACC here if needed ;
output;
_IORC_=0; _ERROR_=0;
putlog ID= FLAG=;
end;
drop _:;
run;
Hello ChrisNZ,
thank you, your approach looks very promising. How would I have to change your piece of code to put the flag in the obs where condition 2 is checked (here obs=3, but in the real data set it would not have to be the row where first.ID = 1)?
Hello,
Thank you for your suggestions. Since the desired output was requested:
ID Product Acc Type Value1-Value3 Err
1 C 3 Red 3 3 1 1 A 1 Blue 2 1 0 E1 1 E . . . E2 2 D 2 Orange 0 2 . E1 2 B 4 Yellow 4 4 2 E3 3 B 3 Blue 5 3 . E1 3 C 5 Orange 1 1 . 3 E . . . E2
data out;
input ID $ Product $ Acc $ Type $ Value1 Value2 Value3 Err $;
datalines;
1 C 3 Red 3 3 1 .
1 A 1 Blue 2 1 0 E1
1 E . . . . . E2
2 D 2 Orange 0 2 . E1
2 B 4 Yellow 4 4 2 E3
3 B 3 Blue 5 3 . E1
3 C 5 Orange 1 1 . .
3 E . . . . . E2
;
To be precise.
The logic you want is still a bit unclear. This matches your output. Please adapt as needed.
data WANT;
set HAVE point=_N_;
by ID;
if VALUE1 ^= VALUE2 then ERR = 'E1';
output;
if TYPE in ('Blue','Orange') & ^find(PRODUCT,'E') then do;
call missing(ACC, TYPE, VALUE1, VALUE2, VALUE3);
PRODUCT = 'E';
ERR = 'E2';
if last.ID then output;
end;
if PRODUCT='D' & ACC='2' then do;
_PRODUCT=PRODUCT; _ACC=ACC;
PRODUCT='B'; ACC='4';
set HAVE key=A;
if ^_IORC_ & VALUE1 > 0 then do;
ERR = 'E3';
PRODUCT=_PRODUCT; ACC=_ACC;
output;
_IORC_=0; _ERROR_=0;
end;
end;
drop _:;
run;
proc print;run;
Obs | ID | Product | Acc | Type | Value1 | Value2 | Value3 | Err |
---|---|---|---|---|---|---|---|---|
1 | 1 | C | 3 | Red | 3 | 3 | 1 | |
2 | 1 | A | 1 | Blue | 2 | 1 | 0 | E1 |
3 | 1 | E | . | . | . | E2 | ||
4 | 2 | D | 2 | Orange | 0 | 2 | . | E1 |
5 | 2 | B | 4 | Yellow | 4 | 4 | 2 | E3 |
6 | 3 | B | 3 | Blue | 5 | 3 | . | E1 |
7 | 3 | C | 5 | Orange | 1 | 1 | . | |
8 | 3 | E | . | . | . | E2 |
Hello ChrisNZ,
Unfortunately, the point-option does not work for me.
I got an error message, stating:
Assuming I understood what you mean.
data have; input ID $ Product $ Acc $ Type $ Value1 Value2 Value3; datalines; 1 C 3 Red 3 3 1 1 A 1 Blue 2 1 0 2 D 2 Orange 0 2 . 2 B 4 Yellow 4 4 2 3 B 3 Blue 5 3 . 3 C 5 Orange 1 1 . ; data want; do until(last.id); set have; by id; Err=' '; if Value1 ^= Value2 then Err = 'E1'; if Product='B' and Acc='4' and value1>0 then flag31=1; if Product='D' and Acc='2' and value1=0 then flag32=1; if last.id and flag31=1 and flag32=1 then Err='E3'; output; if last.id and Type in ('Blue','Orange') and find(Product,'E') = 0 then do; Product='E';call missing(of Acc--Value3);Err='E2';output; end; end; drop flag31 flag32; run;
Hello Ksharp,
thank you for helpful suggestion! However, I need to have more flexibility in the code (regarding the "if last.ID ..." parts) and have found a solution that works fine for me with two do-until-loops, see below.
data want(drop=flag:);
do until (last.ID);
set have;
by ID;
if find(Product,'E') = 0 then flag2 = 1;
if Product = 'B' and Acc = '4' and Value1 > 0 then flag31 = 1;
if Product = 'D' and Acc = '2' and Value1 = 0 then flag32 = 1;
end;
do until (last.ID);
set out;
by ID;
output;
if Value1 ^= Value2 then Err = 'E1';
if Type in ('Blue','Orange') and flag2 = 1 then do;
Product='E'; call missing(of Acc--Value3); Err='E2';
end;
if flag31 = 1 and flag32 = 1 then Err = 'E3';
end;
run;
@all: Thanks for your suggestions/help!
Just for the record: If using Ksharps code without the "last.ID"-condition regarding Err 'E3' it works fine.
I myself have discovered that the double do-until-loop (where I had a little mistake as I read in "out" instead of "have" in the second do-until-loop and therefore had to adjust some more lines) for some reason does not correctly detect Error 'E1' in one instance.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.