Solved: Re: Select rows

Ronein · Posted 10-25-2021 05:11 AM

Hello

My raw data data set includes 2 columns: Customer_ID and person_ID.

The information shows the owners of each customer.

For example: Customer 1 has 2 owners:123 and 456

I want to select the rows from data set have by the following criteria:

A customer that has 2 owners and one of the owners is owner (alone) of another customer.

So in this example the following rows will be selected by this criteria:

1 123 1 456 2 456

7 000 7 111 8 000

My question:

What is the way to select the rows by the criteria that I have mentioned ?

Data have;
input Customer_id person_id ;
cards;
1 123
1 456
2 456
3 767
4 654 
5 999
5 888
6 231
6 763
6 124
7 000
7 111
8 000
9 555
;
Run;

Kurt_Bremser · Posted 10-26-2021 05:38 AM

This creates your want dataset:

data have;
input Customer_id $ person_id $;
cards;
1 123
1 456
2 456
3 767
4 654 
5 999
5 888
6 231
6 763
6 124
7 000
7 111
8 000
9 555
;

data two one;
set have;
by customer_id;
if first.customer_id and last.customer_id
then output one;
else output two;
run;

proc sql;
create table with_one as
select distinct t1.customer_id, t1.person_id
from two t1, one t2
where t1.person_id = t2.person_id
order by customer_id;
quit;

data want;
set have;
if _n_ = 1
then do;
  declare hash two (dataset:"with_one");
  two.definekey("customer_id");
  two.definedone();
  declare hash one (dataset:"with_one");
  one.definekey("person_id");
  one.definedone();
end;
if two.check() = 0 or one.check() = 0;
run;

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

View solution in original post

PeterClemmensen · Posted 10-25-2021 05:27 AM

Customer_ID = 2 has only 1 owner?

The DATA to DATA Step Macro
Blog: SASnrd

Ronein · Posted 10-25-2021 05:40 AM

Yes, customer_ID=2 has only 1 owner.

I am looking for situation when a customer_ID has 2 owners and one of the owners belongs also to another customer_ID and he is alone there

PeterClemmensen · Posted 10-25-2021 05:41 AM

So if there were 2 obs for person_id, we should not output in this case?

The DATA to DATA Step Macro
Blog: SASnrd

Ronein · Posted 10-25-2021 05:45 AM

The rows that need to be in output are
1 123
1 456
2 456
7 000
7 111
8 000

because these rows meet the criteria:
There is a customer with 2 owners and one of these ID's belong to another customer and this customer has only 1 owner

Kurt_Bremser · Posted 10-25-2021 06:47 AM

What if a customer has more than 2 owners?

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Ronein · Posted 10-25-2021 01:27 PM

Then he shouldn't be there .
I wrote the conditions:
1-one account with 2 owners
2- second account with 1 owner
3- The owner in second account is also an owner in first account
4- The owners don't belong to any other account ( only first account and second account are in the ownerships).
This situation is when a man+wife has a bank account and the man also has a bank account by himself...

Kurt_Bremser · Posted 10-26-2021 05:38 AM

This creates your want dataset:

data have;
input Customer_id $ person_id $;
cards;
1 123
1 456
2 456
3 767
4 654 
5 999
5 888
6 231
6 763
6 124
7 000
7 111
8 000
9 555
;

data two one;
set have;
by customer_id;
if first.customer_id and last.customer_id
then output one;
else output two;
run;

proc sql;
create table with_one as
select distinct t1.customer_id, t1.person_id
from two t1, one t2
where t1.person_id = t2.person_id
order by customer_id;
quit;

data want;
set have;
if _n_ = 1
then do;
  declare hash two (dataset:"with_one");
  two.definekey("customer_id");
  two.definedone();
  declare hash one (dataset:"with_one");
  one.definekey("person_id");
  one.definedone();
end;
if two.check() = 0 or one.check() = 0;
run;

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Ronein · Posted 10-26-2021 07:59 AM

Hello

The code you showed is perfect but when I applied it on the real data I get an error ,

NOTE: Compression was disabled for data set WORK.WANT because compression overhead would increase the size of the data set.
ERROR: Undeclared key symbol customer_id  for hash object at line 34 column 3.
ERROR: DATA STEP Component Object failure.  Aborted during the EXECUTION phase.
NOTE: The SAS System stopped processing this step because of errors.
NOTE: There were 1 observations read from the data set WORK.HAVE.
WARNING: The data set WORK.WANT may be incomplete.  When this step was stopped there were 0 observations and 2 variables.
WARNING: Data set WORK.WANT was not replaced because this step was stopped.

Kurt_Bremser · Posted 10-26-2021 08:05 AM

My code works with the data as posted. I can only develop and test with data I have.

Please post the complete log of all steps that I supplied.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Ronein · Posted 10-26-2021 09:15 AM

May you please show another solution without Hash methology?

Kurt_Bremser · Posted 10-26-2021 09:24 AM

Join the one and two datasets with the with_one dataset, based on the respective relevant keys, then concatenate the results. After 1200+ posts here, this will be a simple task you can do on your own.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Ronein · Posted 10-26-2021 12:53 PM

So this is the solution without using Hash, thank you for your advice

data have;
input Customer_id $ person_id $;
cards;
1 123
1 456
2 456
3 767
4 654 
5 999
5 888
6 231
6 763
6 124
7 000
7 111
8 000
9 555
;

data two one;
set have;
by customer_id;
if first.customer_id and last.customer_id
then output one;
else output two;
run;

/***Person_ID's from two that exists i one****/
proc sql;
create table with_one as
select distinct t1.customer_id, t1.person_id
from two t1, one t2
where t1.person_id = t2.person_id
order by customer_id;
quit;


proc sql;
create table as Final_List_From_one
select b.*
from with_one as a
inner join one as b
on a.person_id=b.person_id
;
quit;


proc sql;
create table as Final_List_From_two
select b.*
from with_one as a
inner join two as b
on a.customer_id=b.customer_id
;
quit;

Data wanted;
set  Final_List_From_one  
      Final_List_From_two;
Run;

Ronein · Posted 10-26-2021 01:01 PM

I want yo ask also about your code please,

In your code you used cartesian product .

proc sql;
create table with_one as
select distinct t1.customer_id, t1.person_id
from two t1, one t2
where t1.person_id = t2.person_id
order by customer_id;
quit;

Why didn't you use inner join? As I see both ways are giving same results , which is more efficient when have 10 million rows?

proc sql;
create table with_one as
select  t1.customer_id, t1.person_id
from two t1
inner join  one t2
on t1.person_id = t2.person_id
order by customer_id;
quit;

Kurt_Bremser · Posted 10-26-2021 01:16 PM

I'm used to the WHERE method from our DB/2 people. The optimizer there is most probably more powerful than that of PROC SQL 😉

One day I will ask why they prefer it to inner joins.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Registration is open

SAS Training: Just a Click Away