BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Ronein
Onyx | Level 15

Hello

My raw data data set includes 2 columns: Customer_ID and person_ID.

The information shows the owners  of each customer.

For example: Customer 1 has 2 owners:123 and 456

I want to select the rows from data set have by the following criteria:

A customer that has 2 owners and one of the owners is owner (alone) of another customer.

So in this example the following rows will be selected by this criteria:

1 123 1 456 2 456

7 000 7 111 8 000

My question:

What is the way to select the rows by the criteria that I have mentioned ?

 

Data have;
input Customer_id person_id ;
cards;
1 123
1 456
2 456
3 767
4 654 
5 999
5 888
6 231
6 763
6 124
7 000
7 111
8 000
9 555
;
Run;
1 ACCEPTED SOLUTION

Accepted Solutions
Kurt_Bremser
Super User

This creates your want dataset:

data have;
input Customer_id $ person_id $;
cards;
1 123
1 456
2 456
3 767
4 654 
5 999
5 888
6 231
6 763
6 124
7 000
7 111
8 000
9 555
;

data two one;
set have;
by customer_id;
if first.customer_id and last.customer_id
then output one;
else output two;
run;

proc sql;
create table with_one as
select distinct t1.customer_id, t1.person_id
from two t1, one t2
where t1.person_id = t2.person_id
order by customer_id;
quit;

data want;
set have;
if _n_ = 1
then do;
  declare hash two (dataset:"with_one");
  two.definekey("customer_id");
  two.definedone();
  declare hash one (dataset:"with_one");
  one.definekey("person_id");
  one.definedone();
end;
if two.check() = 0 or one.check() = 0;
run;

View solution in original post

16 REPLIES 16
PeterClemmensen
Tourmaline | Level 20

Customer_ID = 2 has only 1 owner?

Ronein
Onyx | Level 15

Yes, customer_ID=2 has only 1 owner.

I am looking for situation when a customer_ID has 2 owners and one of the owners belongs also to another customer_ID and he is alone there

PeterClemmensen
Tourmaline | Level 20

So if there were 2 obs for person_id, we should not output in this case?

Ronein
Onyx | Level 15
The rows that need to be in output are
1 123
1 456
2 456
7 000
7 111
8 000

because these rows meet the criteria:
There is a customer with 2 owners and one of these ID's belong to another customer and this customer has only 1 owner
Ronein
Onyx | Level 15
Then he shouldn't be there .
I wrote the conditions:
1-one account with 2 owners
2- second account with 1 owner
3- The owner in second account is also an owner in first account
4- The owners don't belong to any other account ( only first account and second account are in the ownerships).
This situation is when a man+wife has a bank account and the man also has a bank account by himself...
Kurt_Bremser
Super User

This creates your want dataset:

data have;
input Customer_id $ person_id $;
cards;
1 123
1 456
2 456
3 767
4 654 
5 999
5 888
6 231
6 763
6 124
7 000
7 111
8 000
9 555
;

data two one;
set have;
by customer_id;
if first.customer_id and last.customer_id
then output one;
else output two;
run;

proc sql;
create table with_one as
select distinct t1.customer_id, t1.person_id
from two t1, one t2
where t1.person_id = t2.person_id
order by customer_id;
quit;

data want;
set have;
if _n_ = 1
then do;
  declare hash two (dataset:"with_one");
  two.definekey("customer_id");
  two.definedone();
  declare hash one (dataset:"with_one");
  one.definekey("person_id");
  one.definedone();
end;
if two.check() = 0 or one.check() = 0;
run;
Ronein
Onyx | Level 15

Hello

The  code you showed is perfect but when I applied it on the real data I get an error ,

NOTE: Compression was disabled for data set WORK.WANT because compression overhead would increase the size of the data set.
ERROR: Undeclared key symbol customer_id  for hash object at line 34 column 3.
ERROR: DATA STEP Component Object failure.  Aborted during the EXECUTION phase.
NOTE: The SAS System stopped processing this step because of errors.
NOTE: There were 1 observations read from the data set WORK.HAVE.
WARNING: The data set WORK.WANT may be incomplete.  When this step was stopped there were 0 observations and 2 variables.
WARNING: Data set WORK.WANT was not replaced because this step was stopped.
Kurt_Bremser
Super User

My code works with the data as posted. I can only develop and test with data I have.

Please post the complete log of all steps that I supplied.

Ronein
Onyx | Level 15

May you please show another solution without Hash methology?

Kurt_Bremser
Super User

Join the one and two datasets with the with_one dataset, based on the respective relevant keys, then concatenate the results. After 1200+ posts here, this will be a simple task you can do on your own.

Ronein
Onyx | Level 15

So this is the solution without using Hash, thank you for your advice

data have;
input Customer_id $ person_id $;
cards;
1 123
1 456
2 456
3 767
4 654 
5 999
5 888
6 231
6 763
6 124
7 000
7 111
8 000
9 555
;

data two one;
set have;
by customer_id;
if first.customer_id and last.customer_id
then output one;
else output two;
run;

/***Person_ID's from two that exists i one****/
proc sql;
create table with_one as
select distinct t1.customer_id, t1.person_id
from two t1, one t2
where t1.person_id = t2.person_id
order by customer_id;
quit;


proc sql;
create table as Final_List_From_one
select b.*
from with_one as a
inner join one as b
on a.person_id=b.person_id
;
quit;


proc sql;
create table as Final_List_From_two
select b.*
from with_one as a
inner join two as b
on a.customer_id=b.customer_id
;
quit;

Data wanted;
set  Final_List_From_one  
      Final_List_From_two;
Run;
Ronein
Onyx | Level 15

I want yo ask also about your code please,

In your code you used cartesian product .

proc sql;
create table with_one as
select distinct t1.customer_id, t1.person_id
from two t1, one t2
where t1.person_id = t2.person_id
order by customer_id;
quit;

Why didn't you use inner join? As I see both ways are giving same results , which is more efficient when have 10 million rows?

proc sql;
create table with_one as
select  t1.customer_id, t1.person_id
from two t1
inner join  one t2
on t1.person_id = t2.person_id
order by customer_id;
quit;
Kurt_Bremser
Super User

I'm used to the WHERE method from our DB/2 people. The optimizer there is most probably more powerful than that of PROC SQL 😉

One day I will ask why they prefer it to inner joins.

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 16 replies
  • 3099 views
  • 4 likes
  • 3 in conversation