Solved: How to delete a set of data based on a condition not being met over a ...

curiosity · Posted 07-07-2022 11:48 AM

Hi all,

I'm working on a dataset that has repeated data. Below is a sample of the Have data:

CustID	Count	CustTo
AAA	3	AAA
AAA	3	BBB
AAA	3	DDD
BBB	2	AAA
BBB	2	AAA
CCC	1	BBB
DDD	3	AAA
DDD	3	BBB
DDD	3	AAA
EEE	2	GGG
EEE	2	HHH
FFF	1	AAA
GGG	1	AAA
HHH	4	BBB
HHH	4	BBB
HHH	4	BBB
HHH	4	HHH

in sas format:

data have;
infile datalines truncover dsd dlm=',';
input (custID count custto) ($);
datalines;
AAA,3,AAA
AAA,3,BBB
AAA,3,DDD
BBB,2,AAA
BBB,2,AAA
CCC,1,BBB
DDD,3,AAA
DDD,3,BBB
DDD,3,AAA
EEE,2,GGG
EEE,2,HHH
FFF,1,AAA
GGG,1,AAA
HHH,4,BBB
HHH,4,BBB
HHH,4,BBB
HHH,4,HHH
;

I want to look at the data in groups, one CUSTID at a time.

For each CUSTID the same CUSTID should be present in the CustTo column at least once or else that entire group of CUSTID obs is to be deleted (or sent to another output dataset).

The Want table would look like this, where only AAA and HHH data remains and all other CUSTIDs were removed.:

CustID	Count	CustTo
AAA	3	AAA
AAA	3	BBB
AAA	3	DDD
HHH	4	BBB
HHH	4	BBB
HHH	4	BBB
HHH	4	HHH

Based on searches on this forum I tried this but it did not work in the full dataset:
proc sql;
create table want as
select * from have where custid in (select distinct custto from have);
quit;

Any help would be much appreciated!

Kurt_Bremser · Posted 07-07-2022 12:30 PM

Read the keys where the condition is met into a hash:

data want;
set have;
if _n_ = 1
then do;
  declare hash h (dataset:"have (where=(custid = custto))");
  h.definekey("custid");
  h.definedone();
end;
if h.check() = 0;
run;

or use a double DO loop:

data want;
flag = 0;
do until (last.custid);
  set have;
  by custid;
  if custid = custto then flag = 1;
end;
do until (last.custid);
  set have;
  by custid;
  if flag then output;
end;
drop flag;
run;

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

View solution in original post

Kurt_Bremser · Posted 07-07-2022 12:30 PM

Read the keys where the condition is met into a hash:

data want;
set have;
if _n_ = 1
then do;
  declare hash h (dataset:"have (where=(custid = custto))");
  h.definekey("custid");
  h.definedone();
end;
if h.check() = 0;
run;

or use a double DO loop:

data want;
flag = 0;
do until (last.custid);
  set have;
  by custid;
  if custid = custto then flag = 1;
end;
do until (last.custid);
  set have;
  by custid;
  if flag then output;
end;
drop flag;
run;

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

curiosity · Posted 07-07-2022 01:30 PM

Thanks Kurt, that worked!

I have a related question (not sure if I should ask that in a new topic as it is similar but with one additional column).
If you wish I can move this question to a new topic.

data have;
  infile datalines truncover dsd dlm=',';
  input (custID count custfrom custto) ($);
  datalines;
AxA,3,aaa,aaa
AxA,3,aaa,bbb
AxA,3,aaa,ccc
B2B,2,bbb,ccc
B2B,2,bbb,ddd
CCC,1,ccc,aaa
DzD,3,ddd,aaa
DzD,3,ddd,hhh
DzD,3,ddd,ccc
EEE,2,eee,ccc
EtE,2,eee,aaa
FtF,1,fff,ggg
GGG,1,ggg,hhh
HHt,4,hhh,aaa
HHt,4,hhh,bbb
HHt,4,hhh,ccc
HHt,4,hhh,hhh
;

In this case I want to subset by orderID.
If custfrom does not equal to custto at least once per orderID then I would like to remove all of those orderid observations from the dataset.

want would look like this:

CustID	Count	Custfrom	custto
AxA	3	aaa	aaa
AxA	3	aaa	bbb
AxA	3	aaa	ccc
HHt	4	hhh	aaa
HHt	4	hhh	bbb
HHt	4	hhh	ccc
HHt	4	hhh	hhh

thank you!

Tom · Posted 07-07-2022 01:30 PM

Just add your condition into the subquery.

proc sql;
create table want as
  select a.* from have a
  where a.custid in 
      (select distinct b.custid 
       from have b
       where b.custid = b.custto
      )
;
quit;

How to delete a set of data based on a condition not being met over a group of multiple observations

Re: How to delete a set of data based on a condition not being met over a group of multiple observat

Re: How to delete a set of data based on a condition not being met over a group of multiple observat

Re: How to delete a set of data based on a condition not being met over a group of multiple observat

Re: How to delete a set of data based on a condition not being met over a group of multiple observat

How to delete a set of data based on a condition not being met over a group of multiple observations

Re: How to delete a set of data based on a condition not being met over a group of multiple observat

Re: How to delete a set of data based on a condition not being met over a group of multiple observat

Re: How to delete a set of data based on a condition not being met over a group of multiple observat

Re: How to delete a set of data based on a condition not being met over a group of multiple observat

Ready to join fellow brilliant minds for the SAS Hackathon?

Click image to register for webinar

Classroom Training Available!