BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Ronein
Onyx | Level 15

Hello

My raw data data set includes 2 columns: Customer_ID and person_ID.

The information shows the owners  of each customer.

For example: Customer 1 has 2 owners:123 and 456

I want to select the rows from data set have by the following criteria:

A customer that has 2 owners and one of the owners is owner (alone) of another customer.

So in this example the following rows will be selected by this criteria:

1 123 1 456 2 456

7 000 7 111 8 000

My question:

What is the way to select the rows by the criteria that I have mentioned ?

 

Data have;
input Customer_id person_id ;
cards;
1 123
1 456
2 456
3 767
4 654 
5 999
5 888
6 231
6 763
6 124
7 000
7 111
8 000
9 555
;
Run;
1 ACCEPTED SOLUTION

Accepted Solutions
Kurt_Bremser
Super User

This creates your want dataset:

data have;
input Customer_id $ person_id $;
cards;
1 123
1 456
2 456
3 767
4 654 
5 999
5 888
6 231
6 763
6 124
7 000
7 111
8 000
9 555
;

data two one;
set have;
by customer_id;
if first.customer_id and last.customer_id
then output one;
else output two;
run;

proc sql;
create table with_one as
select distinct t1.customer_id, t1.person_id
from two t1, one t2
where t1.person_id = t2.person_id
order by customer_id;
quit;

data want;
set have;
if _n_ = 1
then do;
  declare hash two (dataset:"with_one");
  two.definekey("customer_id");
  two.definedone();
  declare hash one (dataset:"with_one");
  one.definekey("person_id");
  one.definedone();
end;
if two.check() = 0 or one.check() = 0;
run;

View solution in original post

16 REPLIES 16
PeterClemmensen
Tourmaline | Level 20

Customer_ID = 2 has only 1 owner?

Ronein
Onyx | Level 15

Yes, customer_ID=2 has only 1 owner.

I am looking for situation when a customer_ID has 2 owners and one of the owners belongs also to another customer_ID and he is alone there

PeterClemmensen
Tourmaline | Level 20

So if there were 2 obs for person_id, we should not output in this case?

Ronein
Onyx | Level 15
The rows that need to be in output are
1 123
1 456
2 456
7 000
7 111
8 000

because these rows meet the criteria:
There is a customer with 2 owners and one of these ID's belong to another customer and this customer has only 1 owner
Ronein
Onyx | Level 15
Then he shouldn't be there .
I wrote the conditions:
1-one account with 2 owners
2- second account with 1 owner
3- The owner in second account is also an owner in first account
4- The owners don't belong to any other account ( only first account and second account are in the ownerships).
This situation is when a man+wife has a bank account and the man also has a bank account by himself...
Kurt_Bremser
Super User

This creates your want dataset:

data have;
input Customer_id $ person_id $;
cards;
1 123
1 456
2 456
3 767
4 654 
5 999
5 888
6 231
6 763
6 124
7 000
7 111
8 000
9 555
;

data two one;
set have;
by customer_id;
if first.customer_id and last.customer_id
then output one;
else output two;
run;

proc sql;
create table with_one as
select distinct t1.customer_id, t1.person_id
from two t1, one t2
where t1.person_id = t2.person_id
order by customer_id;
quit;

data want;
set have;
if _n_ = 1
then do;
  declare hash two (dataset:"with_one");
  two.definekey("customer_id");
  two.definedone();
  declare hash one (dataset:"with_one");
  one.definekey("person_id");
  one.definedone();
end;
if two.check() = 0 or one.check() = 0;
run;
Ronein
Onyx | Level 15

Hello

The  code you showed is perfect but when I applied it on the real data I get an error ,

NOTE: Compression was disabled for data set WORK.WANT because compression overhead would increase the size of the data set.
ERROR: Undeclared key symbol customer_id  for hash object at line 34 column 3.
ERROR: DATA STEP Component Object failure.  Aborted during the EXECUTION phase.
NOTE: The SAS System stopped processing this step because of errors.
NOTE: There were 1 observations read from the data set WORK.HAVE.
WARNING: The data set WORK.WANT may be incomplete.  When this step was stopped there were 0 observations and 2 variables.
WARNING: Data set WORK.WANT was not replaced because this step was stopped.
Kurt_Bremser
Super User

My code works with the data as posted. I can only develop and test with data I have.

Please post the complete log of all steps that I supplied.

Ronein
Onyx | Level 15

May you please show another solution without Hash methology?

Kurt_Bremser
Super User

Join the one and two datasets with the with_one dataset, based on the respective relevant keys, then concatenate the results. After 1200+ posts here, this will be a simple task you can do on your own.

Ronein
Onyx | Level 15

So this is the solution without using Hash, thank you for your advice

data have;
input Customer_id $ person_id $;
cards;
1 123
1 456
2 456
3 767
4 654 
5 999
5 888
6 231
6 763
6 124
7 000
7 111
8 000
9 555
;

data two one;
set have;
by customer_id;
if first.customer_id and last.customer_id
then output one;
else output two;
run;

/***Person_ID's from two that exists i one****/
proc sql;
create table with_one as
select distinct t1.customer_id, t1.person_id
from two t1, one t2
where t1.person_id = t2.person_id
order by customer_id;
quit;


proc sql;
create table as Final_List_From_one
select b.*
from with_one as a
inner join one as b
on a.person_id=b.person_id
;
quit;


proc sql;
create table as Final_List_From_two
select b.*
from with_one as a
inner join two as b
on a.customer_id=b.customer_id
;
quit;

Data wanted;
set  Final_List_From_one  
      Final_List_From_two;
Run;
Ronein
Onyx | Level 15

I want yo ask also about your code please,

In your code you used cartesian product .

proc sql;
create table with_one as
select distinct t1.customer_id, t1.person_id
from two t1, one t2
where t1.person_id = t2.person_id
order by customer_id;
quit;

Why didn't you use inner join? As I see both ways are giving same results , which is more efficient when have 10 million rows?

proc sql;
create table with_one as
select  t1.customer_id, t1.person_id
from two t1
inner join  one t2
on t1.person_id = t2.person_id
order by customer_id;
quit;
Kurt_Bremser
Super User

I'm used to the WHERE method from our DB/2 people. The optimizer there is most probably more powerful than that of PROC SQL 😉

One day I will ask why they prefer it to inner joins.

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 16 replies
  • 3195 views
  • 4 likes
  • 3 in conversation