BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
kb011235
Obsidian | Level 7

Hi, 

 

I'm trying to remove rows where key is the same and add_delete is both "A" and "D"

 

data have ;
infile datalines delimiter=',';
input key $50. add_delete $1. ;
datalines;
1234_1234567654321_P_L720,A
1234_1234567654321_P_L738,A
1234_1234567654321_P_L738,D
1234_1234567654321_P_L821,A
1234_1234567654321_P_L821,D
1234_1234567654321_P_R209,A
1234_1234567654321_P_R209,D
1234_7654321234567_P_L720,A
1234_7654321234567_P_L720,D
1234_7654321234567_P_L738,A
1234_7654321234567_P_L738,D
1234_7654321234567_P_L821,D
1234_7654321234567_P_R209,A
1234_7654321234567_P_R209,D
;

The output of this example should be 

key add_delete
1234_1234567654321_P_L720 A
1234_7654321234567_P_L821 D

 

It might be as simple as counting the occurrences by key and if it's greater than 1, delete. I'm testing that now. Thanks in advance.

1 ACCEPTED SOLUTION

Accepted Solutions
PGStats
Opal | Level 21

Like this?

 

data have ;
infile datalines dsd;
length key $50 add_delete $1;
input key add_delete;
datalines;
1234_1234567654321_P_L720,A
1234_1234567654321_P_L738,A
1234_1234567654321_P_L738,D
1234_1234567654321_P_L821,A
1234_1234567654321_P_L821,D
1234_1234567654321_P_R209,A
1234_1234567654321_P_R209,D
1234_7654321234567_P_L720,A
1234_7654321234567_P_L720,D
1234_7654321234567_P_L738,A
1234_7654321234567_P_L738,D
1234_7654321234567_P_L821,D
1234_7654321234567_P_R209,A
1234_7654321234567_P_R209,D
;

proc sql;
select 
    * 
from have as a
where not exists (select * from have as b where a.key=b.key and a.add_delete ne b.add_delete);
quit;

PGStats_0-1627586196973.png

 

PG

View solution in original post

7 REPLIES 7
Reeza
Super User
Recode A to 1 and D to -1.
Then sum it by the KEY. Anything that is 0 gets deleted.
Rydhm
Obsidian | Level 7
But _L720 also has both 'A' and 'D' for same Key.

1234_7654321234567_P_L720,A
1234_7654321234567_P_L720,D
PGStats
Opal | Level 21

Like this?

 

data have ;
infile datalines dsd;
length key $50 add_delete $1;
input key add_delete;
datalines;
1234_1234567654321_P_L720,A
1234_1234567654321_P_L738,A
1234_1234567654321_P_L738,D
1234_1234567654321_P_L821,A
1234_1234567654321_P_L821,D
1234_1234567654321_P_R209,A
1234_1234567654321_P_R209,D
1234_7654321234567_P_L720,A
1234_7654321234567_P_L720,D
1234_7654321234567_P_L738,A
1234_7654321234567_P_L738,D
1234_7654321234567_P_L821,D
1234_7654321234567_P_R209,A
1234_7654321234567_P_R209,D
;

proc sql;
select 
    * 
from have as a
where not exists (select * from have as b where a.key=b.key and a.add_delete ne b.add_delete);
quit;

PGStats_0-1627586196973.png

 

PG
mkeintz
PROC Star

If the

  1. The data are sorted by KEY, as they are in your sample
  2. You never have more than one D per key, or more then one A per key

then a DATA step with a BY statement will work, by keeping only those KEY's with a single observation:

 

data have ;
infile datalines delimiter=',';
input key :$50. add_delete $1. ;
datalines;
1234_1234567654321_P_L720,A
1234_1234567654321_P_L738,A
1234_1234567654321_P_L738,D
1234_1234567654321_P_L821,A
1234_1234567654321_P_L821,D
1234_1234567654321_P_R209,A
1234_1234567654321_P_R209,D
1234_7654321234567_P_L720,A
1234_7654321234567_P_L720,D
1234_7654321234567_P_L738,A
1234_7654321234567_P_L738,D
1234_7654321234567_P_L821,D
1234_7654321234567_P_R209,A
1234_7654321234567_P_R209,D
;
data want;
  set have;
  by key;
  if first.key=1 and last.key=1;
run;

 

 

Alternatively, using a condition more analogous to @PGStats's  suggestion.

 

data want;
  merge have (where=(add_delete='A') in=ina)
        have (where=(add_delete='D') in=ind);
  by key;
  where ina=0 or ind=0;
run;

which just says to keep those KEY's in which either A never appears or D never appears.

 

For large datasets, this may be faster than the SQL solution because it only compares contiguous records for matching KEYs.  But again, it requires the data to be sorted by KEY.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
kb011235
Obsidian | Level 7

My first try was to use a similar datastep which didn't work: 

data want;
  set have;
  by key;
  if first.add_delete='A' and last.add_delete='D' then delete;
run;

Can you help me understand what's wrong with this setup?

Cynthia_sas
SAS Super FREQ
Hi:
The problem with this logic is that the FIRST.byvar and LAST.byvar values are only ever 0 or 1 (numbers). And your BY statement in the code is BY KEY; so your program is creating FIRST.KEY and LAST.KEY. But you have coded FIRST.ADD_DELETE and LAST.ADD_DELETE, which should result in messages in the log something like: "NOTE: Invalid numeric data" -- because of the fact that FIRST. and LAST. variables are numeric variables but you are testing for values of 'A' and 'D'.
And if you intend to use FIRST. and LAST. variables in your DATA step program, ALL the BY variables must be listed in the BY statement.
Cynthia
Ksharp
Super User
data have ;
infile datalines dsd;
length key $50 add_delete $1;
input key add_delete;
datalines;
1234_1234567654321_P_L720,A
1234_1234567654321_P_L738,A
1234_1234567654321_P_L738,D
1234_1234567654321_P_L821,A
1234_1234567654321_P_L821,D
1234_1234567654321_P_R209,A
1234_1234567654321_P_R209,D
1234_7654321234567_P_L720,A
1234_7654321234567_P_L720,D
1234_7654321234567_P_L738,A
1234_7654321234567_P_L738,D
1234_7654321234567_P_L821,D
1234_7654321234567_P_R209,A
1234_7654321234567_P_R209,D
;

proc sql;
select 
    * 
from have as a
group by key
having count(distinct add_delete)=1;
quit;

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

LIBNAME 101

Follow along as SAS technical trainer Dominique Weatherspoon expertly answers all your questions about SAS Libraries.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 7 replies
  • 900 views
  • 6 likes
  • 7 in conversation