- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi everybody,
I have a data set like the following one (which is created from three different data sets of similar structure).
data have;
input ID $ Product $ Acc $ Type $ Value1 Value2 Value3;
datalines;
1 C 3 Red 3 3 1
1 A 1 Blue 2 1 0
2 D 2 Orange 0 2 .
2 B 4 Yellow 4 4 2
3 B 3 Blue 5 3 .
3 C 5 Orange 1 1 .
;
Now, I want to perform some "intra-observation" checks to detect errors and also sometimes conditionally create new rows like in this example:
data want;
set have;
by ID;
if Value1 ^= Value2 then Err = 'E1';
output;
if Type in ('Blue','Orange') and find(Product,'E') = 0 then do;
Product = 'E';
Acc = '';
Type = '';
Value1 = .;
Value2 = .;
Value3 = .;
Err = 'E2';
if last.ID then output;
end;
run;
Besides, I also want to perform "inter-observation" checks for each ID, f. ex. whether for certain combinations of Product and Acc some condition is fulfilled, f. ex. (fake code):
do over ID;
if for comb('B','4') Value1 > 0 and for comb('D','2') Value1 = 0 then do;
...
Err = 'E3';
if last.ID then output;
end;
end;
(note that I am referring to obs 4 and 3 in the "have" data set).
I have not been able to do this within a data step and I am starting to think that this operation might only possible with "proc iml" or some other way. However, I have no experience in working with SAS/IML and thus, any suggestions are appreciated.
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Assuming I understood what you mean.
data have; input ID $ Product $ Acc $ Type $ Value1 Value2 Value3; datalines; 1 C 3 Red 3 3 1 1 A 1 Blue 2 1 0 2 D 2 Orange 0 2 . 2 B 4 Yellow 4 4 2 3 B 3 Blue 5 3 . 3 C 5 Orange 1 1 . ; data want; do until(last.id); set have; by id; Err=' '; if Value1 ^= Value2 then Err = 'E1'; if Product='B' and Acc='4' and value1>0 then flag31=1; if Product='D' and Acc='2' and value1=0 then flag32=1; if last.id and flag31=1 and flag32=1 then Err='E3'; output; if last.id and Type in ('Blue','Orange') and find(Product,'E') = 0 then do; Product='E';call missing(of Acc--Value3);Err='E2';output; end; end; drop flag31 flag32; run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Your description of your objective is very sparse. But you can do "inter" obs comparsons by reading through a single id group, in a DO UNTIL LAST.ID do-group. Something like:
data have;
input ID $ Product $ Acc $ Type $ Value1 Value2 Value3;
datalines;
1 C 3 Red 3 3 1
1 A 1 Blue 2 1 0
2 D 2 Orange 0 2 .
2 B 4 Yellow 4 4 2
3 B 3 Blue 5 3 .
3 C 5 Orange 1 1 .
;
data want (keep=ID ERR);
do until (last.id);
set have;
by id;
if catx(',',product,acc)='B,4' and value1>0 then _condition_a1=1; else
if catx(',',product,acc)='D,2' and value1=0 then _condition_a2=1;
length type_list $400;
type_list=catx(',',type_list,type);
end;
length ERR $10;
if n(of _condition_a:)=2 then do;
err='E3';
output;
end;
if findw(type_list,'Blue')>0 and findw(type_list,'Orange')>0 then do;
err='Type 2 error';
output;
end;
run;
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set
Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets
--------------------------
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
You could try PROC SQL.
And better to post your desired output to better explain your question.
data have;
input ID $ Product $ Acc $ Type $ Value1 Value2 Value3;
datalines;
1 C 3 Red 3 3 1
1 A 1 Blue 2 1 0
2 D 2 Orange 0 2 .
2 B 4 Yellow 4 4 2
3 B 3 Blue 5 3 .
3 C 5 Orange 1 1 .
;
proc sql;
select id,
case
when sum(type in ('Blue','Orange')) and sum(Product='E') then 'E'
else ' '
end as Product
from have
group by id;
quit;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Another way:
data HAVE(index=(A=(ID PRODUCT ACC)));
input ID $ PRODUCT $ ACC $ TYPE $ VALUE1 VALUE2 VALUE3;
datalines;
1 C 3 Red 3 3 1
1 A 1 Blue 2 1 0
2 D 2 Orange 0 2 .
2 B 4 Yellow 4 4 2
3 B 3 Blue 5 3 .
3 C 5 Orange 1 1 .
run;
data WANT;
set HAVE;
by ID;
if last.ID then do;
* save values of PRODUCT and ACC here if needed ;
* if for comb('B','4') VALUE1 > 0 ;
PRODUCT='B'; ACC='4';
set HAVE key=A;
_CONDITION1= (^_IORC_ & VALUE1 > 0); * keys found and value > 0 ;
* and for comb('D','2') VALUE1 = 0 ;
PRODUCT='D'; ACC='2';
set HAVE key=A;
_CONDITION2= (^_IORC_ & VALUE1 = 0); * keys found and value = 0 ;
if _CONDITION1 & _CONDITION2 then FLAG='Y';
* restore values of PRODUCT and ACC here if needed ;
output;
_IORC_=0; _ERROR_=0;
putlog ID= FLAG=;
end;
drop _:;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello ChrisNZ,
thank you, your approach looks very promising. How would I have to change your piece of code to put the flag in the obs where condition 2 is checked (here obs=3, but in the real data set it would not have to be the row where first.ID = 1)?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
Thank you for your suggestions. Since the desired output was requested:
ID Product Acc Type Value1-Value3 Err
1 C 3 Red 3 3 1 1 A 1 Blue 2 1 0 E1 1 E . . . E2 2 D 2 Orange 0 2 . E1 2 B 4 Yellow 4 4 2 E3 3 B 3 Blue 5 3 . E1 3 C 5 Orange 1 1 . 3 E . . . E2
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
data out;
input ID $ Product $ Acc $ Type $ Value1 Value2 Value3 Err $;
datalines;
1 C 3 Red 3 3 1 .
1 A 1 Blue 2 1 0 E1
1 E . . . . . E2
2 D 2 Orange 0 2 . E1
2 B 4 Yellow 4 4 2 E3
3 B 3 Blue 5 3 . E1
3 C 5 Orange 1 1 . .
3 E . . . . . E2
;
To be precise.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
The logic you want is still a bit unclear. This matches your output. Please adapt as needed.
data WANT;
set HAVE point=_N_;
by ID;
if VALUE1 ^= VALUE2 then ERR = 'E1';
output;
if TYPE in ('Blue','Orange') & ^find(PRODUCT,'E') then do;
call missing(ACC, TYPE, VALUE1, VALUE2, VALUE3);
PRODUCT = 'E';
ERR = 'E2';
if last.ID then output;
end;
if PRODUCT='D' & ACC='2' then do;
_PRODUCT=PRODUCT; _ACC=ACC;
PRODUCT='B'; ACC='4';
set HAVE key=A;
if ^_IORC_ & VALUE1 > 0 then do;
ERR = 'E3';
PRODUCT=_PRODUCT; ACC=_ACC;
output;
_IORC_=0; _ERROR_=0;
end;
end;
drop _:;
run;
proc print;run;
Obs | ID | Product | Acc | Type | Value1 | Value2 | Value3 | Err |
---|---|---|---|---|---|---|---|---|
1 | 1 | C | 3 | Red | 3 | 3 | 1 | |
2 | 1 | A | 1 | Blue | 2 | 1 | 0 | E1 |
3 | 1 | E | . | . | . | E2 | ||
4 | 2 | D | 2 | Orange | 0 | 2 | . | E1 |
5 | 2 | B | 4 | Yellow | 4 | 4 | 2 | E3 |
6 | 3 | B | 3 | Blue | 5 | 3 | . | E1 |
7 | 3 | C | 5 | Orange | 1 | 1 | . | |
8 | 3 | E | . | . | . | E2 |
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello ChrisNZ,
Unfortunately, the point-option does not work for me.
I got an error message, stating:
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Assuming I understood what you mean.
data have; input ID $ Product $ Acc $ Type $ Value1 Value2 Value3; datalines; 1 C 3 Red 3 3 1 1 A 1 Blue 2 1 0 2 D 2 Orange 0 2 . 2 B 4 Yellow 4 4 2 3 B 3 Blue 5 3 . 3 C 5 Orange 1 1 . ; data want; do until(last.id); set have; by id; Err=' '; if Value1 ^= Value2 then Err = 'E1'; if Product='B' and Acc='4' and value1>0 then flag31=1; if Product='D' and Acc='2' and value1=0 then flag32=1; if last.id and flag31=1 and flag32=1 then Err='E3'; output; if last.id and Type in ('Blue','Orange') and find(Product,'E') = 0 then do; Product='E';call missing(of Acc--Value3);Err='E2';output; end; end; drop flag31 flag32; run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello Ksharp,
thank you for helpful suggestion! However, I need to have more flexibility in the code (regarding the "if last.ID ..." parts) and have found a solution that works fine for me with two do-until-loops, see below.
data want(drop=flag:);
do until (last.ID);
set have;
by ID;
if find(Product,'E') = 0 then flag2 = 1;
if Product = 'B' and Acc = '4' and Value1 > 0 then flag31 = 1;
if Product = 'D' and Acc = '2' and Value1 = 0 then flag32 = 1;
end;
do until (last.ID);
set out;
by ID;
output;
if Value1 ^= Value2 then Err = 'E1';
if Type in ('Blue','Orange') and flag2 = 1 then do;
Product='E'; call missing(of Acc--Value3); Err='E2';
end;
if flag31 = 1 and flag32 = 1 then Err = 'E3';
end;
run;
@all: Thanks for your suggestions/help!
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Just for the record: If using Ksharps code without the "last.ID"-condition regarding Err 'E3' it works fine.
I myself have discovered that the double do-until-loop (where I had a little mistake as I read in "out" instead of "have" in the second do-until-loop and therefore had to adjust some more lines) for some reason does not correctly detect Error 'E1' in one instance.