Hi,
I'm trying to remove rows where key is the same and add_delete is both "A" and "D"
data have ;
infile datalines delimiter=',';
input key $50. add_delete $1. ;
datalines;
1234_1234567654321_P_L720,A
1234_1234567654321_P_L738,A
1234_1234567654321_P_L738,D
1234_1234567654321_P_L821,A
1234_1234567654321_P_L821,D
1234_1234567654321_P_R209,A
1234_1234567654321_P_R209,D
1234_7654321234567_P_L720,A
1234_7654321234567_P_L720,D
1234_7654321234567_P_L738,A
1234_7654321234567_P_L738,D
1234_7654321234567_P_L821,D
1234_7654321234567_P_R209,A
1234_7654321234567_P_R209,D
;
The output of this example should be
key | add_delete |
1234_1234567654321_P_L720 | A |
1234_7654321234567_P_L821 | D |
It might be as simple as counting the occurrences by key and if it's greater than 1, delete. I'm testing that now. Thanks in advance.
Like this?
data have ;
infile datalines dsd;
length key $50 add_delete $1;
input key add_delete;
datalines;
1234_1234567654321_P_L720,A
1234_1234567654321_P_L738,A
1234_1234567654321_P_L738,D
1234_1234567654321_P_L821,A
1234_1234567654321_P_L821,D
1234_1234567654321_P_R209,A
1234_1234567654321_P_R209,D
1234_7654321234567_P_L720,A
1234_7654321234567_P_L720,D
1234_7654321234567_P_L738,A
1234_7654321234567_P_L738,D
1234_7654321234567_P_L821,D
1234_7654321234567_P_R209,A
1234_7654321234567_P_R209,D
;
proc sql;
select
*
from have as a
where not exists (select * from have as b where a.key=b.key and a.add_delete ne b.add_delete);
quit;
Like this?
data have ;
infile datalines dsd;
length key $50 add_delete $1;
input key add_delete;
datalines;
1234_1234567654321_P_L720,A
1234_1234567654321_P_L738,A
1234_1234567654321_P_L738,D
1234_1234567654321_P_L821,A
1234_1234567654321_P_L821,D
1234_1234567654321_P_R209,A
1234_1234567654321_P_R209,D
1234_7654321234567_P_L720,A
1234_7654321234567_P_L720,D
1234_7654321234567_P_L738,A
1234_7654321234567_P_L738,D
1234_7654321234567_P_L821,D
1234_7654321234567_P_R209,A
1234_7654321234567_P_R209,D
;
proc sql;
select
*
from have as a
where not exists (select * from have as b where a.key=b.key and a.add_delete ne b.add_delete);
quit;
If the
then a DATA step with a BY statement will work, by keeping only those KEY's with a single observation:
data have ;
infile datalines delimiter=',';
input key :$50. add_delete $1. ;
datalines;
1234_1234567654321_P_L720,A
1234_1234567654321_P_L738,A
1234_1234567654321_P_L738,D
1234_1234567654321_P_L821,A
1234_1234567654321_P_L821,D
1234_1234567654321_P_R209,A
1234_1234567654321_P_R209,D
1234_7654321234567_P_L720,A
1234_7654321234567_P_L720,D
1234_7654321234567_P_L738,A
1234_7654321234567_P_L738,D
1234_7654321234567_P_L821,D
1234_7654321234567_P_R209,A
1234_7654321234567_P_R209,D
;
data want;
set have;
by key;
if first.key=1 and last.key=1;
run;
Alternatively, using a condition more analogous to @PGStats's suggestion.
data want;
merge have (where=(add_delete='A') in=ina)
have (where=(add_delete='D') in=ind);
by key;
where ina=0 or ind=0;
run;
which just says to keep those KEY's in which either A never appears or D never appears.
For large datasets, this may be faster than the SQL solution because it only compares contiguous records for matching KEYs. But again, it requires the data to be sorted by KEY.
My first try was to use a similar datastep which didn't work:
data want;
set have;
by key;
if first.add_delete='A' and last.add_delete='D' then delete;
run;
Can you help me understand what's wrong with this setup?
data have ;
infile datalines dsd;
length key $50 add_delete $1;
input key add_delete;
datalines;
1234_1234567654321_P_L720,A
1234_1234567654321_P_L738,A
1234_1234567654321_P_L738,D
1234_1234567654321_P_L821,A
1234_1234567654321_P_L821,D
1234_1234567654321_P_R209,A
1234_1234567654321_P_R209,D
1234_7654321234567_P_L720,A
1234_7654321234567_P_L720,D
1234_7654321234567_P_L738,A
1234_7654321234567_P_L738,D
1234_7654321234567_P_L821,D
1234_7654321234567_P_R209,A
1234_7654321234567_P_R209,D
;
proc sql;
select
*
from have as a
group by key
having count(distinct add_delete)=1;
quit;
Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.
If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website.
For SAS newbies, this video is a great way to get started. James Harroun walks through the process using SAS Studio for SAS OnDemand for Academics, but the same steps apply to any analytics project.
Find more tutorials on the SAS Users YouTube channel.