SAS Programming

DATA Step, Macro, Functions and more
BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
nemez
Fluorite | Level 6

Hi everybody,

 

I have a data set like the following one (which is created from three different data sets of similar structure).

data have;
input ID $ Product $ Acc $ Type $ Value1 Value2 Value3;
datalines;
1 C 3 Red 3 3 1
1 A 1 Blue 2 1 0
2 D 2 Orange 0 2 .
2 B 4 Yellow 4 4 2
3 B 3 Blue 5 3 .
3 C 5 Orange 1 1 .
;

Now, I want to perform some "intra-observation" checks to detect errors and also sometimes conditionally create new rows like in this example:

data want;
set have;
by ID;
if Value1 ^= Value2 then Err = 'E1';
output;
if Type in ('Blue','Orange') and find(Product,'E') = 0 then do;
Product = 'E';
Acc = '';
Type = '';
Value1 = .;
Value2 = .;
Value3 = .;
Err = 'E2';
if last.ID then output;
end;
run;

Besides, I also want to perform "inter-observation" checks for each ID, f. ex. whether for certain combinations of Product and Acc some condition is fulfilled, f. ex. (fake code):

do over ID;
if for comb('B','4') Value1 > 0 and for comb('D','2') Value1 = 0 then do;
... Err = 'E3'; if last.ID then output; end;
end;

(note that I am referring to obs 4 and 3 in the "have" data set).

I have not been able to do this within a data step and I am starting to think that this operation might only possible with "proc iml" or some other way. However, I have no experience in working with SAS/IML and thus, any suggestions are appreciated.

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User

Assuming I understood what you mean.

 

data have;
input ID $ Product $ Acc $ Type $ Value1 Value2 Value3;
datalines;
1 C 3 Red 3 3 1
1 A 1 Blue 2 1 0
2 D 2 Orange 0 2 .
2 B 4 Yellow 4 4 2
3 B 3 Blue 5 3 .
3 C 5 Orange 1 1 .
;

data want;
do until(last.id);
 set have;
 by id;
 Err='  ';
 if Value1 ^= Value2 then Err = 'E1';

 if Product='B' and Acc='4' and value1>0 then flag31=1;
 if Product='D' and Acc='2' and value1=0 then flag32=1;
 if last.id and  flag31=1 and flag32=1 then Err='E3';
 output;

 if last.id and  Type in ('Blue','Orange') and find(Product,'E') = 0 then do;
   Product='E';call missing(of Acc--Value3);Err='E2';output;
 end;
end; 
drop  flag31 flag32;
run;

View solution in original post

11 REPLIES 11
mkeintz
PROC Star

Your description of your objective is very sparse.  But you can do "inter" obs comparsons by reading through a single id group, in a DO UNTIL LAST.ID  do-group.  Something like:

 

data have;
input ID $ Product $ Acc $ Type $ Value1 Value2 Value3;
datalines;
1 C 3 Red 3 3 1
1 A 1 Blue 2 1 0
2 D 2 Orange 0 2 .
2 B 4 Yellow 4 4 2
3 B 3 Blue 5 3 .
3 C 5 Orange 1 1 .
;

data want (keep=ID ERR);
  do until (last.id);
    set have;
    by id;
    if catx(',',product,acc)='B,4' and value1>0 then _condition_a1=1; else 
    if catx(',',product,acc)='D,2' and value1=0 then _condition_a2=1; 

    length type_list $400;
    type_list=catx(',',type_list,type);

  end;
  length ERR $10;
  if n(of _condition_a:)=2 then do;
    err='E3';
    output;
  end;

  if findw(type_list,'Blue')>0 and findw(type_list,'Orange')>0 then do;
    err='Type 2 error';
    output;
  end;
run;


--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Ksharp
Super User

You could try PROC SQL.

And better to post your desired output to better explain your question.

 

data have;
input ID $ Product $ Acc $ Type $ Value1 Value2 Value3;
datalines;
1 C 3 Red 3 3 1
1 A 1 Blue 2 1 0
2 D 2 Orange 0 2 .
2 B 4 Yellow 4 4 2
3 B 3 Blue 5 3 .
3 C 5 Orange 1 1 .
;

proc sql;
select id, 

case
when sum(type in ('Blue','Orange')) and sum(Product='E')  then 'E'
else ' '
end as Product

 from have
  group by id;
quit;
ChrisNZ
Tourmaline | Level 20

Another way:

data HAVE(index=(A=(ID PRODUCT ACC)));
  input ID $ PRODUCT $ ACC $ TYPE $ VALUE1 VALUE2 VALUE3;
datalines;
1 C 3 Red 3 3 1
1 A 1 Blue 2 1 0
2 D 2 Orange 0 2 .
2 B 4 Yellow 4 4 2
3 B 3 Blue 5 3 .
3 C 5 Orange 1 1 .
run;

data WANT;
  set HAVE;
  by ID;
  if last.ID then do;
     * save values of PRODUCT and ACC here if needed ;
 
    * if for comb('B','4') VALUE1 > 0  ;
    PRODUCT='B'; ACC='4'; 
    set HAVE key=A;
    _CONDITION1= (^_IORC_ & VALUE1 > 0); * keys found and value > 0 ; 
    
    * and for comb('D','2') VALUE1 = 0 ;
    PRODUCT='D'; ACC='2'; 
    set HAVE key=A;
    _CONDITION2= (^_IORC_ & VALUE1 = 0); * keys found and value = 0 ; 
    
    if _CONDITION1 & _CONDITION2 then FLAG='Y';
    * restore values of PRODUCT and ACC here if needed ;
    
    output;
    _IORC_=0; _ERROR_=0;
    putlog ID= FLAG=;
  end;
  drop _:;
run;
ID=1 FLAG=
ID=2 FLAG=Y
ID=3 FLAG=

 
nemez
Fluorite | Level 6

Hello ChrisNZ,

thank you, your approach looks very promising. How would I have to change your piece of code to put the flag in the obs where condition 2 is checked (here obs=3, but in the real data set it would not have to be the row where first.ID = 1)?

nemez
Fluorite | Level 6

Hello,

Thank you for your suggestions. Since the desired output was requested:

ID Product Acc Type Value1-Value3 Err
1 C 3 Red 3 3 1 1 A 1 Blue 2 1 0 E1 1 E . . . E2 2 D 2 Orange 0 2 . E1 2 B 4 Yellow 4 4 2 E3 3 B 3 Blue 5 3 . E1 3 C 5 Orange 1 1 . 3 E . . . E2
nemez
Fluorite | Level 6
data out;
input ID $ Product $ Acc $ Type $ Value1 Value2 Value3 Err $;
datalines;
1 C 3 Red 3 3 1 .
1 A 1 Blue 2 1 0 E1
1 E . . . . . E2
2 D 2 Orange 0 2 . E1
2 B 4 Yellow 4 4 2 E3
3 B 3 Blue 5 3 . E1
3 C 5 Orange 1 1 . .
3 E . . . . . E2
;

To be precise.

ChrisNZ
Tourmaline | Level 20

The logic you want is still a bit unclear. This matches your output. Please adapt as needed.

data WANT;
  set HAVE point=_N_;
  by ID;
  if VALUE1 ^= VALUE2 then ERR = 'E1';
  output;
  if TYPE in ('Blue','Orange') & ^find(PRODUCT,'E') then do;
    call missing(ACC, TYPE, VALUE1, VALUE2, VALUE3);
    PRODUCT = 'E';
    ERR = 'E2';
    if last.ID then output;
  end;
  if PRODUCT='D' & ACC='2' then do;
    _PRODUCT=PRODUCT; _ACC=ACC;
    PRODUCT='B'; ACC='4'; 
    set HAVE key=A;
    if ^_IORC_ & VALUE1 > 0 then do; 
      ERR = 'E3';
      PRODUCT=_PRODUCT; ACC=_ACC;
      output;
      _IORC_=0; _ERROR_=0;
    end; 
  end;
  drop _:;
run;
proc print;run;
Obs ID Product Acc Type Value1 Value2 Value3 Err
1 1 C 3 Red 3 3 1  
2 1 A 1 Blue 2 1 0 E1
3 1 E     . . . E2
4 2 D 2 Orange 0 2 . E1
5 2 B 4 Yellow 4 4 2 E3
6 3 B 3 Blue 5 3 . E1
7 3 C 5 Orange 1 1 .  
8 3 E     . . . E2

 

nemez
Fluorite | Level 6

Hello ChrisNZ,

Unfortunately, the point-option does not work for me.

I got an error message, stating:

ERROR: The POINT= option is incompatible with the BY statement.
Ksharp
Super User

Assuming I understood what you mean.

 

data have;
input ID $ Product $ Acc $ Type $ Value1 Value2 Value3;
datalines;
1 C 3 Red 3 3 1
1 A 1 Blue 2 1 0
2 D 2 Orange 0 2 .
2 B 4 Yellow 4 4 2
3 B 3 Blue 5 3 .
3 C 5 Orange 1 1 .
;

data want;
do until(last.id);
 set have;
 by id;
 Err='  ';
 if Value1 ^= Value2 then Err = 'E1';

 if Product='B' and Acc='4' and value1>0 then flag31=1;
 if Product='D' and Acc='2' and value1=0 then flag32=1;
 if last.id and  flag31=1 and flag32=1 then Err='E3';
 output;

 if last.id and  Type in ('Blue','Orange') and find(Product,'E') = 0 then do;
   Product='E';call missing(of Acc--Value3);Err='E2';output;
 end;
end; 
drop  flag31 flag32;
run;
nemez
Fluorite | Level 6

Hello Ksharp,

thank you for helpful suggestion! However, I need to have more flexibility in the code (regarding the "if last.ID ..." parts) and have found a solution that works fine for me with two do-until-loops, see below.

data want(drop=flag:);
do until (last.ID);
set have;
by ID;
if find(Product,'E') = 0 then flag2 = 1;
if Product = 'B' and Acc = '4' and Value1 > 0 then flag31 = 1;
if Product = 'D' and Acc = '2' and Value1 = 0 then flag32 = 1;
end;
do until (last.ID);
set out;
by ID;
output;
if Value1 ^= Value2 then Err = 'E1';
if Type in ('Blue','Orange') and flag2 = 1  then do;
	Product='E'; call missing(of Acc--Value3); Err='E2';
end;
if flag31 = 1 and flag32 = 1 then Err = 'E3';
end;
run;

@all: Thanks for your suggestions/help!

nemez
Fluorite | Level 6

Just for the record: If using Ksharps code without the "last.ID"-condition regarding Err 'E3' it works fine.

I myself have discovered that the double do-until-loop (where I had a little mistake as I read in "out" instead of "have" in the second do-until-loop and therefore had to adjust some more lines) for some reason does not correctly detect Error 'E1' in one instance.

sas-innovate-wordmark-2025-midnight.png

Register Today!

Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.


Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 11 replies
  • 1543 views
  • 2 likes
  • 4 in conversation