BookmarkSubscribeRSS Feed
umashankersaini
Quartz | Level 8

Hi Team,

Greetings for the day!

 

Please help me to write a sql to get the related party details.

 

For example suppose i have opened my account with mobile and email. 

My wife has used my mobile number while opening her account and i have used my email id for my daughter's account.

In this case my wife's account is related with my mobile and daughter's account is related with my email id and scenario could be daughter has used wife's email.

 

So if i am searching for my account, i should get wife's and daughter's account as well.

 

I have tried to prepare the sample data with result output:

Sample :

ID Phone Email
5534385355 385534616655 ABC@GMAIL.COM
5548558464 3588333353 ABC12.com
5564556334 6366636666 ABC2@gmail.com
554814331 6001066401 ABC10.com
5564556331 6366636666 ABC3@gmail.com
5548558456 68136551346 ABC7.com
5538083556 6066688561 ABC4D.COM
554816644 68141055686 ABC9.com
5563065535 6355555563486 ABC5@yahoo.com
55155355386 6460535366 ABC11.com
553438530 385534616655 ABC1@GMAIL.COM
554884666 6354641861 ABC13.com
5534385355 385534616655 ABC@GMAIL.COM
555336340 6836833860 ABC6.com
554363456 6066688561 ABC4D.COM
55464315555 6863083133 ABC8.com
556343006 6355555563486 ABC5@yahoo.com

 

Output :

ID Phone Email
5534385355 385534616655 ABC@GMAIL.COM
553438530 385534616655 ABC1@GMAIL.COM
5564556334 6366636666 ABC2@gmail.com
5564556331 6366636666 ABC3@gmail.com
5538083556 6066688561 ABC4D.COM
554363456 6066688561 ABC4D.COM
556343006 6355555563486 ABC5@yahoo.com
5563065535 6355555563486 ABC5@yahoo.com

 

Please help.

 

Regards,

Uma Shanker Saini

6 REPLIES 6
Patrick
Opal | Level 21

Is there a reason that it must be SQL and can't be a data step?

Is a single query just getting a single account as input and you then want all the related accounts OR is this actually about creating networks for all of your data?

How many rows do you have in your actual data?

 

I believe what you really will have to do is something along the line of How to find all connected components in a graph

Mazi
Pyrite | Level 9

@Patrick , I was thinking the same. A hashobject would be easier in my opinion. Otherwise with SQL I am thinking of joining the table with itself

Patrick
Opal | Level 21

@Mazi If I understand the problem then account A and C could have no common attributes but are still linked via account B that shares attributes with both A and C ...which a single SQL self-join couldn't detect. 

Mazi
Pyrite | Level 9

@Patrick, Agreed. I am also not entirely sure I understand the requirement/purpose here. My thinking was to have several self joins and finally transpose the columns to have them as required.

A_Kh
Lapis Lazuli | Level 10

Does this work?

proc sql;
	create table want as
		(select*, count(distinct id) as gr1
			from have
				group by phone
					having gr1 gt 1)
						union corr
					(select*, count(distinct id) as gr2
				from have
			group by email
		having gr2 gt 1)
	order by email, phone, id;
quit; 
mkeintz
PROC Star

I suspect the problem with an SQL solution is that you have to recursively find links until no more can be found.  I.e. with every newly linked ID, you have to review the remaining obs for a even newer link by phone or email.  Result: you would need to have a criterion within your sql code to determine that there are no more to be added.  I.e. you have to know when to stop joins.

 

Do you have SAS/OR?  If so, then PROC OPTNET can solve the problem of recursion.  It will find all the possible links and "connected components" in the entire data set, including the components connected to your desired id (say "5534385355" in your case).

 

To do so, you have to find all links via phone number, and all links via email, as in :

 

 

data have;
  input ID	Phone	Email :$30.;
datalines;
5534385355	385534616655	ABC@GMAIL.COM
5548558464	3588333353	ABC12.com
5564556334	6366636666	ABC2@gmail.com
554814331	     6001066401	ABC10.com
5564556331	6366636666	ABC3@gmail.com
5548558456	68136551346	ABC7.com
5538083556	6066688561	ABC4D.COM
554816644	     68141055686	ABC9.com
5563065535	6355555563486	ABC5@yahoo.com
55155355386	6460535366	ABC11.com
553438530	     385534616655	ABC1@GMAIL.COM
554884666	     6354641861	ABC13.com
5534385355	385534616655	ABC@GMAIL.COM
555336340	     6836833860	ABC6.com
554363456	     6066688561	ABC4D.COM
55464315555	6863083133	ABC8.com
556343006	     6355555563486	ABC5@yahoo.com
run;
  
proc freq data=have noprint;
  tables phone*id / out=phid (keep=phone id);
  tables email*id / out=emid (keep=email id);
run;

 

 

The datasets produced by proc freq will effectively list, in order, all the ID's for each phone (in PHID), and all the ID's for each email (EMID).  From that you can create all discovered pairs:

 

 

data pairs (keep=id1 id2 weight);
  set phid emid;
  retain weight 1;
  id1=lag(id);
  id2=id;
  if phone=lag(phone) and email=lag(email);
run;

 

 

Now in the above, you might have ID=1001 connected to ID=1002, and another obs with ID=1002 connected to ID=1003.  PROC OPTNET below will know that 1003 is also connected (for your purposes) to 1001. even though there is no explicit obs stating that linkage: 

 

 

proc optnet data_links=pairs  out_nodes=id_nodes (rename=(node=id));
  data_link_vars  from=id1 to=id2;
  concomp;
run;

 

 

The "concomp" statement tells optnet to create a connection id (concomp) that has the same value for every ID in a connected group.  Take a look at the dataset OUT_NODES it creates.  So take dataset out_nodes and get original data for the concomp that contains ID 5534385355:

 

data want;
  if 0 then set have;
  if _n_=1 then do;
    declare hash h ();
      h.definekey('id');
      h.definedata('id');
      h.definedone();
    do until (end_of_wantlist);
      merge id_nodes (where=(id=5534385355) in=wanted)
            id_nodes  end=end_of_wantlist;
      by concomp;
      if wanted then h.add();
    end;
  end;
  set have;
  if h.check()=0;
run;

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 1073 views
  • 3 likes
  • 5 in conversation