DATA TEST;
LENGTH POLICY $10.;
INFILE DATALINES DLM='|';
INPUT POLICY $ CODE ;
DATALINES;
AAA|1
AAA|2
AAA|3
AAA|4
AAA|5
AAB|5
AAB|6
AAC|5
;
RUN;
/* If we look at the code, the even code (2,4,6) are related to a cancellation.
so we have a code 1 and code 2 for the same policy, those lines should be deleted.
Same thing for code 3 and code 4, those lines should be deleted.
Again, same thing for code 5 and 6, those lines should be deleted. So at the end, the test table
should contains only the following lines: how do we do that either with sas code or proc sql, quit;
how do we do that?*/
/* AAA|5
AAC|5
*/
I understand your task as follows. Any time there is a code with an even value, there will have already been a code with a value exactly one less for the same policy. The corresponding records should not be kept. Keep only records with an odd code value, for which there is no code value exactly 1 greater for the same policy.
If the codes really are integers and the data are sorted by policy/code, then:
data want (drop=nxt_:);
set test (keep=policy);
by policy;
merge test
test (firstobs=2 keep=code rename=(code=nxt_code));
if mod(code,2)=1 and (nxt_code^=code+1 or last.policy=1);
run;
Using the firstobs=2 you can look ahead one record and compare the code-in-hand against the next code. If the code in hand is odd and either the next code is one greater, or the current record is the last for the current policy, then it is unmatched and should be output.
This sounds like you want to count the number of observations for each policy number, and if there are an even number of observations, then delete that group. If there are an odd number of observations, then keep the last observation. Does that sound correct? If so, you can use BY-group processing with the MOD function to get the results you want:
For example:
data test2 (drop=cnt);
set test;
by policy;
if first.policy then cnt=0;
cnt+1;
if last.policy then do;
if MOD(cnt,2)=0 then delete;
else output;
end;
run;
Did you run my code against the test data set you specified in your post? Did it give you the results you wanted? If so, what is different in the test data set vs the real data? Were there any errors in the log when when you ran the code against your real data?
Hum! you are close to a solution. One thing we must be advise of, usually, when we have a code 2 then we should have the code 1. When we have a code 4 we should have a code 3. When we have the code 6 we should have the code 5. Otherwise, we could have a code 1, or 5 alone.
I was trying the following approach... a select over partition with case statement but it does not seems to work. Do you already have use this approach.:
proc sql;
create table test2 as
(
select policy, code,
row_number() over ( partition by policy
order by case when (code = '1' and code= '2') then 0
when (code = '3' and code= '4') then 1
when (code = '5' and code= '6') then 2
else 3 end ) As rownumber
from test
);
quit;
So there after, you delete everything with rownumber between 0 and two.
Could it works
I understand your task as follows. Any time there is a code with an even value, there will have already been a code with a value exactly one less for the same policy. The corresponding records should not be kept. Keep only records with an odd code value, for which there is no code value exactly 1 greater for the same policy.
If the codes really are integers and the data are sorted by policy/code, then:
data want (drop=nxt_:);
set test (keep=policy);
by policy;
merge test
test (firstobs=2 keep=code rename=(code=nxt_code));
if mod(code,2)=1 and (nxt_code^=code+1 or last.policy=1);
run;
Using the firstobs=2 you can look ahead one record and compare the code-in-hand against the next code. If the code in hand is odd and either the next code is one greater, or the current record is the last for the current policy, then it is unmatched and should be output.
Check out this tutorial series to learn how to build your own steps in SAS Studio.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.