Hello
My raw data data set includes 2 columns: Customer_ID and person_ID.
The information shows the owners of each customer.
For example: Customer 1 has 2 owners:123 and 456
I want to select the rows from data set have by the following criteria:
A customer that has 2 owners and one of the owners is owner (alone) of another customer.
So in this example the following rows will be selected by this criteria:
1 123
1 456
2 456
7 000
7 111
8 000
My question:
What is the way to select the rows by the criteria that I have mentioned ?
Data have;
input Customer_id person_id ;
cards;
1 123
1 456
2 456
3 767
4 654
5 999
5 888
6 231
6 763
6 124
7 000
7 111
8 000
9 555
;
Run;
This creates your want dataset:
data have;
input Customer_id $ person_id $;
cards;
1 123
1 456
2 456
3 767
4 654
5 999
5 888
6 231
6 763
6 124
7 000
7 111
8 000
9 555
;
data two one;
set have;
by customer_id;
if first.customer_id and last.customer_id
then output one;
else output two;
run;
proc sql;
create table with_one as
select distinct t1.customer_id, t1.person_id
from two t1, one t2
where t1.person_id = t2.person_id
order by customer_id;
quit;
data want;
set have;
if _n_ = 1
then do;
declare hash two (dataset:"with_one");
two.definekey("customer_id");
two.definedone();
declare hash one (dataset:"with_one");
one.definekey("person_id");
one.definedone();
end;
if two.check() = 0 or one.check() = 0;
run;
Customer_ID = 2 has only 1 owner?
Yes, customer_ID=2 has only 1 owner.
I am looking for situation when a customer_ID has 2 owners and one of the owners belongs also to another customer_ID and he is alone there
So if there were 2 obs for person_id, we should not output in this case?
What if a customer has more than 2 owners?
This creates your want dataset:
data have;
input Customer_id $ person_id $;
cards;
1 123
1 456
2 456
3 767
4 654
5 999
5 888
6 231
6 763
6 124
7 000
7 111
8 000
9 555
;
data two one;
set have;
by customer_id;
if first.customer_id and last.customer_id
then output one;
else output two;
run;
proc sql;
create table with_one as
select distinct t1.customer_id, t1.person_id
from two t1, one t2
where t1.person_id = t2.person_id
order by customer_id;
quit;
data want;
set have;
if _n_ = 1
then do;
declare hash two (dataset:"with_one");
two.definekey("customer_id");
two.definedone();
declare hash one (dataset:"with_one");
one.definekey("person_id");
one.definedone();
end;
if two.check() = 0 or one.check() = 0;
run;
Hello
The code you showed is perfect but when I applied it on the real data I get an error ,
NOTE: Compression was disabled for data set WORK.WANT because compression overhead would increase the size of the data set. ERROR: Undeclared key symbol customer_id for hash object at line 34 column 3. ERROR: DATA STEP Component Object failure. Aborted during the EXECUTION phase. NOTE: The SAS System stopped processing this step because of errors. NOTE: There were 1 observations read from the data set WORK.HAVE. WARNING: The data set WORK.WANT may be incomplete. When this step was stopped there were 0 observations and 2 variables. WARNING: Data set WORK.WANT was not replaced because this step was stopped.
My code works with the data as posted. I can only develop and test with data I have.
Please post the complete log of all steps that I supplied.
May you please show another solution without Hash methology?
Join the one and two datasets with the with_one dataset, based on the respective relevant keys, then concatenate the results. After 1200+ posts here, this will be a simple task you can do on your own.
So this is the solution without using Hash, thank you for your advice
data have;
input Customer_id $ person_id $;
cards;
1 123
1 456
2 456
3 767
4 654
5 999
5 888
6 231
6 763
6 124
7 000
7 111
8 000
9 555
;
data two one;
set have;
by customer_id;
if first.customer_id and last.customer_id
then output one;
else output two;
run;
/***Person_ID's from two that exists i one****/
proc sql;
create table with_one as
select distinct t1.customer_id, t1.person_id
from two t1, one t2
where t1.person_id = t2.person_id
order by customer_id;
quit;
proc sql;
create table as Final_List_From_one
select b.*
from with_one as a
inner join one as b
on a.person_id=b.person_id
;
quit;
proc sql;
create table as Final_List_From_two
select b.*
from with_one as a
inner join two as b
on a.customer_id=b.customer_id
;
quit;
Data wanted;
set Final_List_From_one
Final_List_From_two;
Run;
I want yo ask also about your code please,
In your code you used cartesian product .
proc sql; create table with_one as select distinct t1.customer_id, t1.person_id from two t1, one t2 where t1.person_id = t2.person_id order by customer_id; quit;
Why didn't you use inner join? As I see both ways are giving same results , which is more efficient when have 10 million rows?
proc sql; create table with_one as select t1.customer_id, t1.person_id from two t1 inner join one t2 on t1.person_id = t2.person_id order by customer_id; quit;
I'm used to the WHERE method from our DB/2 people. The optimizer there is most probably more powerful than that of PROC SQL 😉
One day I will ask why they prefer it to inner joins.
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.