Hello
My raw data data set includes 2 columns: Customer_ID and person_ID.
The information shows the owners of each customer.
For example: Customer 1 has 2 owners:123 and 456
I want to select the rows from data set have by the following criteria:
A customer that has 2 owners and one of the owners is owner (alone) of another customer.
So in this example the following rows will be selected by this criteria:
1 123
1 456
2 456
7 000
7 111
8 000
My question:
What is the way to select the rows by the criteria that I have mentioned ?
Data have;
input Customer_id person_id ;
cards;
1 123
1 456
2 456
3 767
4 654
5 999
5 888
6 231
6 763
6 124
7 000
7 111
8 000
9 555
;
Run;
This creates your want dataset:
data have;
input Customer_id $ person_id $;
cards;
1 123
1 456
2 456
3 767
4 654
5 999
5 888
6 231
6 763
6 124
7 000
7 111
8 000
9 555
;
data two one;
set have;
by customer_id;
if first.customer_id and last.customer_id
then output one;
else output two;
run;
proc sql;
create table with_one as
select distinct t1.customer_id, t1.person_id
from two t1, one t2
where t1.person_id = t2.person_id
order by customer_id;
quit;
data want;
set have;
if _n_ = 1
then do;
declare hash two (dataset:"with_one");
two.definekey("customer_id");
two.definedone();
declare hash one (dataset:"with_one");
one.definekey("person_id");
one.definedone();
end;
if two.check() = 0 or one.check() = 0;
run;
Customer_ID = 2 has only 1 owner?
Yes, customer_ID=2 has only 1 owner.
I am looking for situation when a customer_ID has 2 owners and one of the owners belongs also to another customer_ID and he is alone there
So if there were 2 obs for person_id, we should not output in this case?
What if a customer has more than 2 owners?
This creates your want dataset:
data have;
input Customer_id $ person_id $;
cards;
1 123
1 456
2 456
3 767
4 654
5 999
5 888
6 231
6 763
6 124
7 000
7 111
8 000
9 555
;
data two one;
set have;
by customer_id;
if first.customer_id and last.customer_id
then output one;
else output two;
run;
proc sql;
create table with_one as
select distinct t1.customer_id, t1.person_id
from two t1, one t2
where t1.person_id = t2.person_id
order by customer_id;
quit;
data want;
set have;
if _n_ = 1
then do;
declare hash two (dataset:"with_one");
two.definekey("customer_id");
two.definedone();
declare hash one (dataset:"with_one");
one.definekey("person_id");
one.definedone();
end;
if two.check() = 0 or one.check() = 0;
run;
Hello
The code you showed is perfect but when I applied it on the real data I get an error ,
NOTE: Compression was disabled for data set WORK.WANT because compression overhead would increase the size of the data set. ERROR: Undeclared key symbol customer_id for hash object at line 34 column 3. ERROR: DATA STEP Component Object failure. Aborted during the EXECUTION phase. NOTE: The SAS System stopped processing this step because of errors. NOTE: There were 1 observations read from the data set WORK.HAVE. WARNING: The data set WORK.WANT may be incomplete. When this step was stopped there were 0 observations and 2 variables. WARNING: Data set WORK.WANT was not replaced because this step was stopped.
My code works with the data as posted. I can only develop and test with data I have.
Please post the complete log of all steps that I supplied.
May you please show another solution without Hash methology?
Join the one and two datasets with the with_one dataset, based on the respective relevant keys, then concatenate the results. After 1200+ posts here, this will be a simple task you can do on your own.
So this is the solution without using Hash, thank you for your advice
data have;
input Customer_id $ person_id $;
cards;
1 123
1 456
2 456
3 767
4 654
5 999
5 888
6 231
6 763
6 124
7 000
7 111
8 000
9 555
;
data two one;
set have;
by customer_id;
if first.customer_id and last.customer_id
then output one;
else output two;
run;
/***Person_ID's from two that exists i one****/
proc sql;
create table with_one as
select distinct t1.customer_id, t1.person_id
from two t1, one t2
where t1.person_id = t2.person_id
order by customer_id;
quit;
proc sql;
create table as Final_List_From_one
select b.*
from with_one as a
inner join one as b
on a.person_id=b.person_id
;
quit;
proc sql;
create table as Final_List_From_two
select b.*
from with_one as a
inner join two as b
on a.customer_id=b.customer_id
;
quit;
Data wanted;
set Final_List_From_one
Final_List_From_two;
Run;
I want yo ask also about your code please,
In your code you used cartesian product .
proc sql; create table with_one as select distinct t1.customer_id, t1.person_id from two t1, one t2 where t1.person_id = t2.person_id order by customer_id; quit;
Why didn't you use inner join? As I see both ways are giving same results , which is more efficient when have 10 million rows?
proc sql; create table with_one as select t1.customer_id, t1.person_id from two t1 inner join one t2 on t1.person_id = t2.person_id order by customer_id; quit;
I'm used to the WHERE method from our DB/2 people. The optimizer there is most probably more powerful than that of PROC SQL 😉
One day I will ask why they prefer it to inner joins.
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Still thinking about your presentation idea? The submission deadline has been extended to Friday, Nov. 14, at 11:59 p.m. ET.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.