Solved: Re: filtering data based on a condition in a dataset

buddha_d · Posted 04-29-2020 12:47 PM

Guys, I have a dataset i have and I want the output to look like this. I want the output to look like this

BBB B2546 C1254
BBB B4578 C12456
BBB B7995 C14576
BBB B1245 C1259

I want to build a logic where on my ID group if there is a match of account1 and account2 then I want that ID group to be deleted (eg: AAA A2345 A2345 & CCC D4568 D4568) and I want to have the ID group that has all different account1 as well as account2 . I tried the code below which didn't work. I appreciate any help on this in advance.

data check;
input id $ account1 $ account2 $ ;
cards;
AAA A1234 B1456
AAA A1234 B74156
AAA A2345 A2345
BBB B2546 C1254
BBB B4578 C12456
BBB B7995 C14576
BBB B1245 C1259
CCC D4568 F1254
CCC D4568 G1458
CCC D4568 D4568
;
RUN;


PROC SQL;
	CREATE TABLE NEW AS 
	SELECT *
	FROM CHECK 
	GROUP BY ID 
	HAVING ACCCOUNT1 NOT IN (SELECT DISTINCT ACCOUNT2 FROM CHECK ); 
QUIT;

Jagadishkatam · Posted 04-29-2020 01:06 PM

please try the below code


proc sql;
create table want as select * from check where id not in  ( select distinct id from check where account1=account2);
quit;

Thanks,
Jag

View solution in original post

PaigeMiller · Posted 04-29-2020 01:03 PM

Almost identical to the logic in this question

https://communities.sas.com/t5/SAS-Programming/How-to-label-rows-that-don-t-match/m-p/643983#M192280

and the solution ought to work for you as well.

--
Paige Miller

Jagadishkatam · Posted 04-29-2020 01:06 PM

please try the below code


proc sql;
create table want as select * from check where id not in  ( select distinct id from check where account1=account2);
quit;

Thanks,
Jag

novinosrin · Posted 04-29-2020 01:20 PM

Hi @Jagadishkatam Hope all's well and you are safe in among the world's Covid19 fight. I'm afraid your code will miss some intricacies should the data be slightly different like:

AAA A1234 B1456
AAA A1234 A2345
AAA A2345 B74156

However, I would have to agree with you that's not the case in the sample posted by @buddha_d . Nonetheless, this is a kind of situation I would often refer back to @ChrisNZ aka Mr. Perfect's teachings to never assume. On the other hand, I would also think genius @mkeintz would prefer a Hash check method solution as he deems a lot of Proc SQL is rather "ready meals" though fun and easy. 🙂 We have our banters every so often as it's been a privilege to learn from these marvelous people.

Therefore, something like the below would be needed to address in full


data check;
input id $ account1 $ account2 $ ;
cards;
AAA A1234 B1456 
AAA A1234 A2345
AAA A2345 B74156
BBB B2546 C1254
BBB B4578 C12456
BBB B7995 C14576
BBB B1245 C1259
CCC D4568 F1254
CCC D4568 G1458
CCC D4568 D4568
;
RUN;

proc sql;
create table want as
select distinct a.*
from check a left join check b
on a.account1=b.account2
group by a.id
having not n(b.account2);
quit;

proc print noobs;run;

id	account1	account2
BBB	B1245	C1259
BBB	B2546	C1254
BBB	B4578	C12456
BBB	B7995	C14576

ChrisNZ · Posted 04-29-2020 05:15 PM

@novinosrin lol love your nicknames! 🙂 Thank you! 👍

High-Performance SAS Coding - Third Edition

Kurt_Bremser · Posted 04-30-2020 02:41 AM

A data step solution:

data want;
merge
  check (
    in=to_delete
    where=(account1 = account2)
  )
  check
;
by id;
if not to_delete;
run;

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

s_lassen · Posted 04-30-2020 05:39 AM

I think a data step is the way to go:

data new;
  merge 
    check(where=(account1=account2) in=duplicate)
    check;
  by id;
  if not duplicate;
run;

You may get a message about more than one data set in the merge having repeat of BY values, but that is OK in this case.

SAS Training: Just a Click Away