My data set is
data have;
input EFAMID $ SvcDate :mmddyy10. OpCode :$2. Pay;
format svcdate yymmdd10.;
datalines;
101 01/13/2013 19 30.7
101 01/14/2013 13 28.0
101 01/16/2013 12 60.2
101 01/16/2014 13 10.1
102 02/10/2013 12 9.2
102 02/11/2013 13 32.1
102 02/12/2013 14 17.5
102 02/13/2013 15 10.2
102 02/18/2013 14 12.3
;
At data step, I get its EFAMID and SvcDate values when process OpCode=12. Then I like to check whether the same EFAMID and SvcDate have OpCode = 13 observations. If yes, a new column AssociatedOp = 1. Otherwise, AssociatedOp = 0. For example, for EFAMID=101, its OpCode=12 has SvcDate='01/16/2013'. Using EFAMID=101 and SvcDate='01/16/2013', we find a observation (101 '01/16/2014' 13 10.1) OpCode=13 at the same day.
Thus, AssociatedOp=1. The final data set would be
EFAMID SvcDate OpCode Pay AssociatedOp
101 01/13/2013 19 30.7 0
101 01/14/2013 13 28.0 0
101 01/16/2013 12 60.2 1
101 01/16/2014 13 10.1 0
102 02/10/2013 12 9.2 0
102 02/11/2013 13 32.1 0
102 02/12/2013 14 17.5 0
102 02/13/2013 15 10.2 0
102 02/18/2013 14 12.3 0
How do I do that at data step? Thanks in advance.
It should not be that bad. Since you are reading the same disk file in the same order the blocks of data should already be cached. 10 Million is not a large number and a data step does not require that the whole dataset be loaded into memory.
I don't get the algorithm.
It looks like your result is true only when OPCODE='12' and there was a previous observation for this EFAMID that had OPCODE='13'.
data want ;
any13=0;
do until(last.efamid);
set have;
by efamid svcdate ;
if opcode='13' then any13=1;
AssociatedOp = opcode='12' and any13 ;
output;
end;
run;
Op Associated Obs any13 EFAMID SvcDate Code Pay Op 1 0 101 2013-01-13 19 30.7 0 2 1 101 2013-01-14 13 28.0 0 3 1 101 2013-01-16 12 60.2 1 4 1 101 2014-01-16 13 10.1 0 5 0 102 2013-02-10 12 9.2 0 6 1 102 2013-02-11 13 32.1 0 7 1 102 2013-02-12 14 17.5 0 8 1 102 2013-02-13 15 10.2 0 9 1 102 2013-02-18 14 12.3 0
What about OPCODE='12' and OPCODE='13' on the same date? Should the sort order be:
by efamid svcdate descending opcode;
What if there were multiple OPCODE='12' observations and only one earlier OPCODE='13' observation? Are they all true?
What about if instead there was an OPCODE='13' observation on a future date? Would that mean the earlier OPCODE='12' record is also flagged with true?
I only care about the AssociatedOp column when OpCode=12. For each EFAMID, there is only one observation with OpCode=12. When OpCode=12, I need to use its EFAMID and SvcDate at that observation and then find whether there are OpCode=13. If observations with OpCode=13 has different EFAMID or SvcDate, the AssociatedOp should be zero.
Then your current example answer is wrong. The one OPCODE='12' you have flagged as true has an OPCODE='13' on a different date.
So which of the two SVDATES has the typo in yoru example? Is the day of month wrong in the first one or is the year wrong in the last one?
101 01/14/2013 13 28.0
101 01/16/2013 12 60.2
101 01/16/2014 13 10.1
Since I am quite new to SAS programming, my question could be summarized to
Data Want;
set have;
AssociatedOp=0;
if OpCode=12 then AssociatedOp = hasAssociatedOp(EFAMID, SvcDate);
run;
How do I create a function 'hasAssociatedOp' which accepts two arguments, EFAMID and SvcDate? If there is one observation at dataset 'have' which matches the EFAMID, SvcDate and OpCode=13, it will return 1. Otherwise return 0.
@sas112 wrote:
Since I am quite new to SAS programming, my question could be summarized to
Data Want;
set have;
AssociatedOp=0;
if OpCode=12 then AssociatedOp = hasAssociatedOp(EFAMID, SvcDate);
run;How do I create a function 'hasAssociatedOp' which accepts two arguments, EFAMID and SvcDate? If there is one observation at dataset 'have' which matches the EFAMID, SvcDate and OpCode=13, it will return 1. Otherwise return 0.
If you literally wanted to make something like that then look at using HASH object. But that is overly complicated and requires that you load data into the hash object.
You could adapt my original suggested data step to your clarified requirements.
data have;
input EFAMID $ SvcDate :mmddyy10. OpCode :$2. Pay;
format svcdate yymmdd10.;
datalines;
101 01/13/2013 19 30.7
101 01/16/2013 13 28.0
101 01/16/2013 12 60.2
101 01/16/2014 13 10.1
102 02/10/2013 12 9.2
102 02/11/2013 13 32.1
102 02/12/2013 14 17.5
102 02/13/2013 15 10.2
102 02/18/2013 14 12.3
;
proc sort data=have;
by efamid svcdate descending opcode;
run;
data want ;
any13=0;
do until(last.svcdate);
set have;
by efamid svcdate descending opcode;
if opcode='13' then any13=1;
AssociatedOp = opcode='12' and any13 ;
output;
end;
drop any13;
run;
Results:
Op Associated Obs EFAMID SvcDate Code Pay Op 1 101 2013-01-13 19 30.7 0 2 101 2013-01-16 13 28.0 0 3 101 2013-01-16 12 60.2 1 4 101 2014-01-16 13 10.1 0 5 102 2013-02-10 12 9.2 0 6 102 2013-02-11 13 32.1 0 7 102 2013-02-12 14 17.5 0 8 102 2013-02-13 15 10.2 0 9 102 2013-02-18 14 12.3 0
Or you could pull out the OPCODE='13' records and remerge those records with the original data by EFAMID and SVCDATE. Then you could test whether you have both 12 and 13 on the same date.
data want ;
merge have
have(keep=efamid svcdate opcode
rename=(opcode=opcode13)
where=(opcode13='13')
)
;
by efamid svcdate;
AssociatedOp = (opcode='12' and opcode13='13');
drop opcode13;
run;
It should not be that bad. Since you are reading the same disk file in the same order the blocks of data should already be cached. 10 Million is not a large number and a data step does not require that the whole dataset be loaded into memory.
Hi, does the below fulfil your purpose, for every obs where opcode =12, if the Svcdate and efamid are the same for opcode=13, the flag should turn 1
data have;
infile datalines;
input EFAMID $ SvcDate :mmddyy10. OpCode :$2. Pay;
format svcdate yymmdd10.;
datalines;
101 01/13/2013 19 30.7
101 01/16/2013 13 28.0
101 01/16/2013 12 60.2
101 01/16/2014 13 10.1
102 02/10/2013 12 9.2
102 02/11/2013 13 32.1
102 02/12/2013 14 17.5
102 02/13/2013 15 10.2
102 02/18/2013 14 12.3
;
run;
proc sort data=have out=have1;
by EFAMID opcode;
run;
data have2 (drop=efamid_12 SvcDate_12);
set have1;
by efamid opcode;
retain efamid_12 SvcDate_12;
format SvcDate_12 yymmdd10.;
if first.opcode then
do;
if opcode = '12' then
do;
efamid_12 = efamid;
SvcDate_12 = SvcDate;
end;
end;
if opcode = 13 then
do;
if efamid_12 = efamid and SvcDate_12 = SvcDate then
AssociatedOp = 1;
else AssociatedOp = 0;
end;
else
do;
AssociatedOp = 0;
end;
run;
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.