Calcite | Level 5

## find observations exist based on conditions

My data set is

data have;
input EFAMID \$ SvcDate :mmddyy10. OpCode :\$2. Pay;
format svcdate yymmdd10.;
datalines;
101 01/13/2013 19 30.7
101 01/14/2013 13 28.0
101 01/16/2013 12 60.2
101 01/16/2014 13 10.1
102 02/10/2013 12 9.2
102 02/11/2013 13 32.1
102 02/12/2013 14 17.5
102 02/13/2013 15 10.2
102 02/18/2013 14 12.3
;

At data step, I get its EFAMID and SvcDate values when process OpCode=12. Then I like to check whether the same EFAMID and SvcDate have OpCode = 13 observations. If yes, a new column AssociatedOp = 1. Otherwise, AssociatedOp = 0. For example, for EFAMID=101, its OpCode=12 has SvcDate='01/16/2013'. Using EFAMID=101 and SvcDate='01/16/2013', we find a observation (101 '01/16/2014' 13 10.1) OpCode=13 at the same day.

Thus, AssociatedOp=1. The final data set would be

EFAMID   SvcDate   OpCode   Pay   AssociatedOp

101        01/13/2013    19         30.7     0
101        01/14/2013    13         28.0     0

101        01/16/2013    12         60.2     1
101        01/16/2014    13         10.1     0
102        02/10/2013    12           9.2     0
102        02/11/2013     13        32.1     0
102        02/12/2013     14        17.5     0
102        02/13/2013     15        10.2     0
102        02/18/2013     14        12.3     0

How do I do that at data step? Thanks in advance.

1 ACCEPTED SOLUTION

Accepted Solutions
Super User

## Re: find observations exist based on conditions

It should not be that bad.  Since you are reading the same disk file in the same order the blocks of data should already be cached. 10 Million is not a large number and a data step does not require that the whole dataset be loaded into memory.

11 REPLIES 11
Super User

## Re: find observations exist based on conditions

I don't get the algorithm.

It looks like your result is true only when OPCODE='12' and there was a previous observation for this EFAMID that had OPCODE='13'.

``````data want ;
any13=0;
do until(last.efamid);
set have;
by efamid svcdate ;
if opcode='13' then any13=1;
AssociatedOp = opcode='12' and any13 ;
output;
end;
run;``````
```                                         Op             Associated
Obs    any13    EFAMID       SvcDate    Code     Pay        Op

1       0       101      2013-01-13     19     30.7         0
2       1       101      2013-01-14     13     28.0         0
3       1       101      2013-01-16     12     60.2         1
4       1       101      2014-01-16     13     10.1         0
5       0       102      2013-02-10     12      9.2         0
6       1       102      2013-02-11     13     32.1         0
7       1       102      2013-02-12     14     17.5         0
8       1       102      2013-02-13     15     10.2         0
9       1       102      2013-02-18     14     12.3         0
```

What about OPCODE='12' and OPCODE='13' on the same date?  Should the sort order be:

``````    by efamid svcdate descending opcode;
``````

What if there were multiple OPCODE='12' observations and only one earlier OPCODE='13' observation? Are they all true?

What about if instead there was an OPCODE='13' observation on a future date?  Would that mean the earlier OPCODE='12' record is also flagged with true?

Calcite | Level 5

## Re: find observations exist based on conditions

I only care about the AssociatedOp column when OpCode=12. For each EFAMID, there is only one observation with OpCode=12. When OpCode=12, I need to use its EFAMID and SvcDate at that observation and then find whether there are OpCode=13. If observations with OpCode=13 has different EFAMID or SvcDate, the AssociatedOp should be zero.

Super User

## Re: find observations exist based on conditions

Then your current example answer is wrong.  The one OPCODE='12' you have flagged as true has an OPCODE='13' on a different date.

Calcite | Level 5

## Re: find observations exist based on conditions

When process EFAMID=101, SvcDate='01/16/2013' and OpCode=12, we see the dataset has
101 01/14/2013 13 28.0
101 01/16/2013 12 60.2
101 01/16/2014 13 10.1
Since we found one OpCode=13 has same EFAMID=101 and SvcDate='01/16/2013'. Thus AssociatedOp=1.
For EFAMID=102, SvcDate='02/10/2013', OpCode=12, we see
102 02/10/2013 12 9.2
102 02/11/2013 13 32.1
Since SvcDate='02/11/2013' and OpCode=13. Thus AssociatedOp=0.
The input dataset (have at the example) is quite large. The solution needs to be efficient.
Super User

## Re: find observations exist based on conditions

So which of the two SVDATES has the typo in yoru example?  Is the day of month wrong in the first one or is the year wrong in the last one?

101 01/14/2013 13 28.0
101 01/16/2013 12 60.2
101 01/16/2014 13 10.1

Calcite | Level 5

## Re: find observations exist based on conditions

Thanks point out. You can change either one to match 01/16/2013. I am sorry I didn't notice that.
Calcite | Level 5

## Re: find observations exist based on conditions

Since I am quite new to SAS programming, my question could be summarized to

`Data Want;set have;AssociatedOp=0;if OpCode=12 then AssociatedOp = hasAssociatedOp(EFAMID, SvcDate);run;`

How do I create a function 'hasAssociatedOp' which accepts two arguments, EFAMID and SvcDate? If there is one observation at dataset 'have' which matches the EFAMID, SvcDate and OpCode=13, it will return 1. Otherwise return 0.

Super User

## Re: find observations exist based on conditions

@sas112 wrote:

Since I am quite new to SAS programming, my question could be summarized to

`Data Want;set have;AssociatedOp=0;if OpCode=12 then AssociatedOp = hasAssociatedOp(EFAMID, SvcDate);run;`

How do I create a function 'hasAssociatedOp' which accepts two arguments, EFAMID and SvcDate? If there is one observation at dataset 'have' which matches the EFAMID, SvcDate and OpCode=13, it will return 1. Otherwise return 0.

If you literally wanted to make something like that then look at using HASH object.  But that is overly complicated and requires that you load data into the hash object.

You could adapt my original suggested data step to your clarified requirements.

``````data have;
input EFAMID \$ SvcDate :mmddyy10. OpCode :\$2. Pay;
format svcdate yymmdd10.;
datalines;
101 01/13/2013 19 30.7
101 01/16/2013 13 28.0
101 01/16/2013 12 60.2
101 01/16/2014 13 10.1
102 02/10/2013 12 9.2
102 02/11/2013 13 32.1
102 02/12/2013 14 17.5
102 02/13/2013 15 10.2
102 02/18/2013 14 12.3
;

proc sort data=have;
by efamid svcdate descending opcode;
run;
data want ;
any13=0;
do until(last.svcdate);
set have;
by efamid svcdate descending opcode;
if opcode='13' then any13=1;
AssociatedOp = opcode='12' and any13 ;
output;
end;
drop any13;
run;
``````

Results:

```                                Op             Associated
Obs    EFAMID       SvcDate    Code     Pay        Op

1      101      2013-01-13     19     30.7         0
2      101      2013-01-16     13     28.0         0
3      101      2013-01-16     12     60.2         1
4      101      2014-01-16     13     10.1         0
5      102      2013-02-10     12      9.2         0
6      102      2013-02-11     13     32.1         0
7      102      2013-02-12     14     17.5         0
8      102      2013-02-13     15     10.2         0
9      102      2013-02-18     14     12.3         0```

Or you could pull out the OPCODE='13' records and remerge those records with the original data by EFAMID and SVCDATE.  Then you could test whether you have both 12 and 13 on the same date.

``````data want ;
merge have
have(keep=efamid svcdate opcode
rename=(opcode=opcode13)
where=(opcode13='13')
)
;
by efamid svcdate;
AssociatedOp = (opcode='12' and opcode13='13');
drop opcode13;
run;``````

Calcite | Level 5

## Re: find observations exist based on conditions

The logics should work. But dataset 'have' may have around 10 million observations. If I merge, does it perform well?
Super User

## Re: find observations exist based on conditions

It should not be that bad.  Since you are reading the same disk file in the same order the blocks of data should already be cached. 10 Million is not a large number and a data step does not require that the whole dataset be loaded into memory.

Pyrite | Level 9

## Re: find observations exist based on conditions

Hi, does the below fulfil your purpose, for every obs where opcode =12, if the Svcdate and efamid are the same for opcode=13, the flag should turn 1

data have;
infile datalines;
input EFAMID \$ SvcDate :mmddyy10. OpCode :\$2. Pay;
format svcdate yymmdd10.;
datalines;
101 01/13/2013 19 30.7
101 01/16/2013 13 28.0
101 01/16/2013 12 60.2
101 01/16/2014 13 10.1
102 02/10/2013 12 9.2
102 02/11/2013 13 32.1
102 02/12/2013 14 17.5
102 02/13/2013 15 10.2
102 02/18/2013 14 12.3
;
run;

proc sort data=have out=have1;
by EFAMID opcode;
run;

data have2 (drop=efamid_12 SvcDate_12);
set have1;
by efamid opcode;
retain efamid_12 SvcDate_12;
format SvcDate_12 yymmdd10.;

if first.opcode then
do;
if opcode = '12' then
do;
efamid_12 = efamid;
SvcDate_12 = SvcDate;
end;
end;

if opcode = 13 then
do;
if efamid_12 = efamid and SvcDate_12 = SvcDate then
AssociatedOp = 1;
else AssociatedOp = 0;
end;
else
do;
AssociatedOp = 0;
end;
run;

Discussion stats
• 11 replies
• 572 views
• 0 likes
• 3 in conversation