BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
nemez
Fluorite | Level 6

Hi everybody,

 

I have a data set like the following one (which is created from three different data sets of similar structure).

data have;
input ID $ Product $ Acc $ Type $ Value1 Value2 Value3;
datalines;
1 C 3 Red 3 3 1
1 A 1 Blue 2 1 0
2 D 2 Orange 0 2 .
2 B 4 Yellow 4 4 2
3 B 3 Blue 5 3 .
3 C 5 Orange 1 1 .
;

Now, I want to perform some "intra-observation" checks to detect errors and also sometimes conditionally create new rows like in this example:

data want;
set have;
by ID;
if Value1 ^= Value2 then Err = 'E1';
output;
if Type in ('Blue','Orange') and find(Product,'E') = 0 then do;
Product = 'E';
Acc = '';
Type = '';
Value1 = .;
Value2 = .;
Value3 = .;
Err = 'E2';
if last.ID then output;
end;
run;

Besides, I also want to perform "inter-observation" checks for each ID, f. ex. whether for certain combinations of Product and Acc some condition is fulfilled, f. ex. (fake code):

do over ID;
if for comb('B','4') Value1 > 0 and for comb('D','2') Value1 = 0 then do;
... Err = 'E3'; if last.ID then output; end;
end;

(note that I am referring to obs 4 and 3 in the "have" data set).

I have not been able to do this within a data step and I am starting to think that this operation might only possible with "proc iml" or some other way. However, I have no experience in working with SAS/IML and thus, any suggestions are appreciated.

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User

Assuming I understood what you mean.

 

data have;
input ID $ Product $ Acc $ Type $ Value1 Value2 Value3;
datalines;
1 C 3 Red 3 3 1
1 A 1 Blue 2 1 0
2 D 2 Orange 0 2 .
2 B 4 Yellow 4 4 2
3 B 3 Blue 5 3 .
3 C 5 Orange 1 1 .
;

data want;
do until(last.id);
 set have;
 by id;
 Err='  ';
 if Value1 ^= Value2 then Err = 'E1';

 if Product='B' and Acc='4' and value1>0 then flag31=1;
 if Product='D' and Acc='2' and value1=0 then flag32=1;
 if last.id and  flag31=1 and flag32=1 then Err='E3';
 output;

 if last.id and  Type in ('Blue','Orange') and find(Product,'E') = 0 then do;
   Product='E';call missing(of Acc--Value3);Err='E2';output;
 end;
end; 
drop  flag31 flag32;
run;

View solution in original post

11 REPLIES 11
mkeintz
PROC Star

Your description of your objective is very sparse.  But you can do "inter" obs comparsons by reading through a single id group, in a DO UNTIL LAST.ID  do-group.  Something like:

 

data have;
input ID $ Product $ Acc $ Type $ Value1 Value2 Value3;
datalines;
1 C 3 Red 3 3 1
1 A 1 Blue 2 1 0
2 D 2 Orange 0 2 .
2 B 4 Yellow 4 4 2
3 B 3 Blue 5 3 .
3 C 5 Orange 1 1 .
;

data want (keep=ID ERR);
  do until (last.id);
    set have;
    by id;
    if catx(',',product,acc)='B,4' and value1>0 then _condition_a1=1; else 
    if catx(',',product,acc)='D,2' and value1=0 then _condition_a2=1; 

    length type_list $400;
    type_list=catx(',',type_list,type);

  end;
  length ERR $10;
  if n(of _condition_a:)=2 then do;
    err='E3';
    output;
  end;

  if findw(type_list,'Blue')>0 and findw(type_list,'Orange')>0 then do;
    err='Type 2 error';
    output;
  end;
run;


--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Ksharp
Super User

You could try PROC SQL.

And better to post your desired output to better explain your question.

 

data have;
input ID $ Product $ Acc $ Type $ Value1 Value2 Value3;
datalines;
1 C 3 Red 3 3 1
1 A 1 Blue 2 1 0
2 D 2 Orange 0 2 .
2 B 4 Yellow 4 4 2
3 B 3 Blue 5 3 .
3 C 5 Orange 1 1 .
;

proc sql;
select id, 

case
when sum(type in ('Blue','Orange')) and sum(Product='E')  then 'E'
else ' '
end as Product

 from have
  group by id;
quit;
ChrisNZ
Tourmaline | Level 20

Another way:

data HAVE(index=(A=(ID PRODUCT ACC)));
  input ID $ PRODUCT $ ACC $ TYPE $ VALUE1 VALUE2 VALUE3;
datalines;
1 C 3 Red 3 3 1
1 A 1 Blue 2 1 0
2 D 2 Orange 0 2 .
2 B 4 Yellow 4 4 2
3 B 3 Blue 5 3 .
3 C 5 Orange 1 1 .
run;

data WANT;
  set HAVE;
  by ID;
  if last.ID then do;
     * save values of PRODUCT and ACC here if needed ;
 
    * if for comb('B','4') VALUE1 > 0  ;
    PRODUCT='B'; ACC='4'; 
    set HAVE key=A;
    _CONDITION1= (^_IORC_ & VALUE1 > 0); * keys found and value > 0 ; 
    
    * and for comb('D','2') VALUE1 = 0 ;
    PRODUCT='D'; ACC='2'; 
    set HAVE key=A;
    _CONDITION2= (^_IORC_ & VALUE1 = 0); * keys found and value = 0 ; 
    
    if _CONDITION1 & _CONDITION2 then FLAG='Y';
    * restore values of PRODUCT and ACC here if needed ;
    
    output;
    _IORC_=0; _ERROR_=0;
    putlog ID= FLAG=;
  end;
  drop _:;
run;
ID=1 FLAG=
ID=2 FLAG=Y
ID=3 FLAG=

 
nemez
Fluorite | Level 6

Hello ChrisNZ,

thank you, your approach looks very promising. How would I have to change your piece of code to put the flag in the obs where condition 2 is checked (here obs=3, but in the real data set it would not have to be the row where first.ID = 1)?

nemez
Fluorite | Level 6

Hello,

Thank you for your suggestions. Since the desired output was requested:

ID Product Acc Type Value1-Value3 Err
1 C 3 Red 3 3 1 1 A 1 Blue 2 1 0 E1 1 E . . . E2 2 D 2 Orange 0 2 . E1 2 B 4 Yellow 4 4 2 E3 3 B 3 Blue 5 3 . E1 3 C 5 Orange 1 1 . 3 E . . . E2
nemez
Fluorite | Level 6
data out;
input ID $ Product $ Acc $ Type $ Value1 Value2 Value3 Err $;
datalines;
1 C 3 Red 3 3 1 .
1 A 1 Blue 2 1 0 E1
1 E . . . . . E2
2 D 2 Orange 0 2 . E1
2 B 4 Yellow 4 4 2 E3
3 B 3 Blue 5 3 . E1
3 C 5 Orange 1 1 . .
3 E . . . . . E2
;

To be precise.

ChrisNZ
Tourmaline | Level 20

The logic you want is still a bit unclear. This matches your output. Please adapt as needed.

data WANT;
  set HAVE point=_N_;
  by ID;
  if VALUE1 ^= VALUE2 then ERR = 'E1';
  output;
  if TYPE in ('Blue','Orange') & ^find(PRODUCT,'E') then do;
    call missing(ACC, TYPE, VALUE1, VALUE2, VALUE3);
    PRODUCT = 'E';
    ERR = 'E2';
    if last.ID then output;
  end;
  if PRODUCT='D' & ACC='2' then do;
    _PRODUCT=PRODUCT; _ACC=ACC;
    PRODUCT='B'; ACC='4'; 
    set HAVE key=A;
    if ^_IORC_ & VALUE1 > 0 then do; 
      ERR = 'E3';
      PRODUCT=_PRODUCT; ACC=_ACC;
      output;
      _IORC_=0; _ERROR_=0;
    end; 
  end;
  drop _:;
run;
proc print;run;
Obs ID Product Acc Type Value1 Value2 Value3 Err
1 1 C 3 Red 3 3 1  
2 1 A 1 Blue 2 1 0 E1
3 1 E     . . . E2
4 2 D 2 Orange 0 2 . E1
5 2 B 4 Yellow 4 4 2 E3
6 3 B 3 Blue 5 3 . E1
7 3 C 5 Orange 1 1 .  
8 3 E     . . . E2

 

nemez
Fluorite | Level 6

Hello ChrisNZ,

Unfortunately, the point-option does not work for me.

I got an error message, stating:

ERROR: The POINT= option is incompatible with the BY statement.
Ksharp
Super User

Assuming I understood what you mean.

 

data have;
input ID $ Product $ Acc $ Type $ Value1 Value2 Value3;
datalines;
1 C 3 Red 3 3 1
1 A 1 Blue 2 1 0
2 D 2 Orange 0 2 .
2 B 4 Yellow 4 4 2
3 B 3 Blue 5 3 .
3 C 5 Orange 1 1 .
;

data want;
do until(last.id);
 set have;
 by id;
 Err='  ';
 if Value1 ^= Value2 then Err = 'E1';

 if Product='B' and Acc='4' and value1>0 then flag31=1;
 if Product='D' and Acc='2' and value1=0 then flag32=1;
 if last.id and  flag31=1 and flag32=1 then Err='E3';
 output;

 if last.id and  Type in ('Blue','Orange') and find(Product,'E') = 0 then do;
   Product='E';call missing(of Acc--Value3);Err='E2';output;
 end;
end; 
drop  flag31 flag32;
run;
nemez
Fluorite | Level 6

Hello Ksharp,

thank you for helpful suggestion! However, I need to have more flexibility in the code (regarding the "if last.ID ..." parts) and have found a solution that works fine for me with two do-until-loops, see below.

data want(drop=flag:);
do until (last.ID);
set have;
by ID;
if find(Product,'E') = 0 then flag2 = 1;
if Product = 'B' and Acc = '4' and Value1 > 0 then flag31 = 1;
if Product = 'D' and Acc = '2' and Value1 = 0 then flag32 = 1;
end;
do until (last.ID);
set out;
by ID;
output;
if Value1 ^= Value2 then Err = 'E1';
if Type in ('Blue','Orange') and flag2 = 1  then do;
	Product='E'; call missing(of Acc--Value3); Err='E2';
end;
if flag31 = 1 and flag32 = 1 then Err = 'E3';
end;
run;

@all: Thanks for your suggestions/help!

nemez
Fluorite | Level 6

Just for the record: If using Ksharps code without the "last.ID"-condition regarding Err 'E3' it works fine.

I myself have discovered that the double do-until-loop (where I had a little mistake as I read in "out" instead of "have" in the second do-until-loop and therefore had to adjust some more lines) for some reason does not correctly detect Error 'E1' in one instance.

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 11 replies
  • 950 views
  • 2 likes
  • 4 in conversation