BookmarkSubscribeRSS Feed
kilo_foxtrot
Calcite | Level 5

Take the example code below containing customers and their devices:

 

 

data test;
    length cust dev $5;
    input cust $ dev $;
    datalines;
    a   1
    a   2
    a   7
    b   2
    b   3
    c   3
    d   4
    e   5
    h   2
    x   4
    ;
run;

I'm trying to group ALL related individuals by customer and device into unique IDs. For example, Customer A is related to Customer B through Device 2, and Customer B is related to Customer C through Device 3. Because A is related to B and B is related to C, these should have the same UID. If you think of customer as the 'parent' and device as the 'child' in node structure, all linked children and parents should be in the same UID. Any breaks should be separate UIDs.

 

Below is the desired output:

 

Customer|Device|UID
    a       1   UID1
    a       2   UID1
    a       7   UID1
    b       2   UID1
    b       3   UID1
    c       3   UID1
    h       2   UID1
    d       4   UID2
    x	    4	UID2
    e       5   UID3

I have tried transposing by Customer and doing a 'cascade' retain on customer and device but the code gets pretty hairy. Also, I do not have access to PROC BOM which I've read may help my case. Does anyone have any ideas on how this might be achieved? 

 

 

18 REPLIES 18
Kurt_Bremser
Super User

You will need to keep two hash objects to keep track of already assigned uid's:

data have;
    length cust dev $5;
    input cust $ dev $;
    datalines;
    a   1
    a   2
    a   7
    b   2
    b   3
    c   3
    d   4
    e   5
    h   2
    x   4
    ;
run;

data want;
set have end=eof;
length uid $4;
retain counter 0;
if _n_ = 1
then do;
  declare hash devices();
  devices.definedata("uid",'dev');
  devices.definekey("dev");
  devices.definedone();
  declare hash customers();
  customers.definedata("uid",'cust');
  customers.definekey("cust");
  customers.definedone();
  call missing(uid);
end;
if customers.find()
then do;
  if devices.find()
  then do;
    counter + 1;
    uid = 'UID' !! put(counter,1.);
    rc = devices.add();
    rc = customers.add();
  end;
  else rc = customers.add();
end;
else do;
  if devices.find() then rc = devices.add();
end;
drop rc counter;
run;

proc sort data=want;
by uid cust;
run;
kilo_foxtrot
Calcite | Level 5

This is close but I'm seeing errors if I input additional rows to test. Take the following example where I've added Customer X Device 5:

data have;
    length cust dev $5;
    input cust $ dev $;
    datalines;
    a   1
    a   2
    a   7
    b   2
    b   3
    c   3
    d   4
    e   5
    h   2
    x   4
    x   5
    ;
run;

Your code provides the following results:

d	4	UID2
x	4	UID2
e	5	UID3
x	5	UID3

However, these should all have the same value of UID2 because Customer X is linked to Devices 4 and 5 which should also link Customer E and D. 

 

novinosrin
Tourmaline | Level 20

data have;
    length cust dev $5;
    input cust $ dev $;
    datalines;
    a   1
    a   2
    a   7
    b   2
    b   3
    c   3
    d   4
    e   5
    h   2
    x   4
    ;
run;

data _null_;
if _n_=1 then do;
  dcl hash H (multidata:'yes') ;
   h.definekey  ("dev") ;
   h.definedata ("cust","dev","uid") ;
   h.definedone () ;
end;
do until(last.cust);
set have end=l;
by cust;
if _n_=1 then  do;
c=_n_;
uid =cats( 'UID', put(c,8. -l));
rc=h.add();
end;
else if not f and h.find()=0 then f=1;
end;
do until(last.cust);
set have ;
by cust;                             
if _n_>1 then do;
if f then rc=h.add();
else do;
if first.cust then do;c+1; uid =cats( 'UID', put(c,8. -l));end;
rc=h.add();
end;
end;
end;
if l then h.output(dataset:'want');
run;


proc sort data=want;
by uid cust dev;
run;
kilo_foxtrot
Calcite | Level 5
Your answer is also very close but provides inaccuracies in the last four rows in the new test data in my reply above.
novinosrin
Tourmaline | Level 20

I realize my mistake, a bit busy now but will try later

kilo_foxtrot
Calcite | Level 5
Thank you -- I appreciate your efforts! I apologize I do not have an in-depth knowledge of hashes in SAS so I am unable to troubleshoot your first answer.
novinosrin
Tourmaline | Level 20

@kilo_foxtrot  Don't worry about technology. I am working on your thread by seeking help from my college mate(who is super smart and solved the same in python)  in handling networking problems. Once he gives me the understanding i need, I will try to code in SAS. No need to apologize. I love learning. 

novinosrin
Tourmaline | Level 20

Hi @kilo_foxtrot  First off, Sorry for the delay as I was consumed with a lot of my own stuff to do. Please see if the following works and let me know. I can further simplify and eliminate some redundancies. Right now, it is 10:10PM at Chicago and I am leaning on my pillows laying with my laptop in my lap. Feeling so tired,  I haven't made it a complete neat one. Yet, I wanted to keep up my word and here you go-

/*Your new updated dataset HAVE*/
data have;
    length cust dev $5;
    input cust $ dev $;
    datalines;
    a   1
    a   2
    a   7
    b   2
    b   3
    c   3
    d   4
    e   5
    h   2
    x   4
    x   5
    ;
run;
dm log 'clear';
/*Intial look up*/
data _null_;
if _n_=1 then do;
length uid __dev __cust $5;
if 0 then set have(rename=(dev=_dev cust=_cust));
  declare hash H (multidata:'yes') ;
   h.definekey  ("__cust") ;
   h.definedata ("__cust","__dev","uid") ;
   h.definedone () ;
   declare hash H1 (dataset:'have(rename=(dev=_dev cust=_cust))', multidata:'yes') ;
   h1.definekey  ("_dev") ;
   h1.definedata ("_cust","_dev") ;
   h1.definedone () ;
   declare hiter hi('h');
   declare hiter hh('h1');
   call missing(__cust,__dev);
end;
array t(999) $5;
n=0;
do until(last.cust);
	set have end=l;
	by cust;
	if first.cust then 
		do;
			if h.find(key:cust) = 0 then f1=1;
			do while(hi.next()=0);
			if dev=__dev	then do; f2=1;leave;end;
			end;
			if sum(f1, f2)=. then do;c+1;uid =cats( 'UID', put(c,8. -l)); end;
		end;
	if not f1 then do rc1=h1.find(key:dev) by 0 while(rc1=0);
		rc=h.add(key:_cust,data:_cust,data: _dev,data:uid);
		if _cust ne cust then do;
		n+1;
		t(n)=_cust;
		end;
		rc1=h1.find_next();
	end;
end;
if not f1 then do i=1 to dim(t);
	if not missing(t(i)) then 
	do while(hh.next()=0);
	if t(i)=_cust then rc=h.add(key:_cust,data:_cust,data: _dev,data:uid);
	end;
end;
if l then h.output(dataset:'w');
run;

/*Final Want	*/
proc sort data=w out=want nodupkey;
by uid __cust __dev;
run;


 

 

Please do let me know. And I will try to see to make some improvements when I feel fresh hopefully after a good sleep. Good night!

kilo_foxtrot
Calcite | Level 5

No need at all to apologize -- this is on your time! If I add Customer A Device 5, it should link every single record to be UID1. Your latest code doesn't quite get this right and it also adds two addition rows to the data:

/*Your new updated dataset HAVE -- new cust x dev 5*/
data have;
length cust dev $5;
input cust $ dev $;
datalines;
a 1
a 2
a 5
a 7
b 2
b 3
c 3
d 4
e 5
h 2
x 4
x 5
;
run;

 

__cust	__dev	uid
a	1	UID1
a	2	UID1
a	5	UID1
a	7	UID1
b	2	UID1
b	3	UID1
c	3	UID1
e	5	UID1
h	2	UID1
x	4	UID1
x	5	UID1
d	4	UID2
x	4	UID2
x	5	UID2

 

I can't thank you enough for your continued efforts on this -- it is getting very complex!

 

novinosrin
Tourmaline | Level 20

Hi @kilo_foxtrot  Thank you. I request your patience(which you are, just saying again Smiley Happy  ) while we do this interactively and at the sametime I work on my stuff too. Trust me, yours is more interesting than mine and how i wish I devoted more time to test. 

Nevetheless, I did a minor change and will wait for your feedback for further changes yet again

 

It's clear you are awesome in testing and that helps.

 

/*Your new updated dataset HAVE -- new cust x dev 5*/
data have;
length cust dev $5;
input cust $ dev $;
datalines;
a 1
a 2
a 5
a 7
b 2
b 3
c 3
d 4
e 5
h 2
x 4
x 5
;
run;





dm log 'clear';
/*Intial look up*/
data _null_;
if _n_=1 then do;
length uid __dev __cust $5;
if 0 then set have(rename=(dev=_dev cust=_cust));
  declare hash H (multidata:'yes') ;
   h.definekey  ("__cust") ;
   h.definedata ("__cust","__dev","uid") ;
   h.definedone () ;
   declare hash H1 (dataset:'have(rename=(dev=_dev cust=_cust))', multidata:'yes') ;
   h1.definekey  ("_dev") ;
   h1.definedata ("_cust","_dev") ;
   h1.definedone () ;
   declare hiter hi('h');
   declare hiter hh('h1');
   call missing(__cust,__dev);
end;
array t(999) $5;
n=0;
do until(last.cust);
	set have end=l;
	by cust;
	if first.cust then 
		do;
			if h.find(key:cust) = 0 then f1=1;
			else do while(hi.next()=0);
			if dev=__dev	then do; f2=1;leave;end;
			end;
			if sum(f1, f2)=. then do;c+1;uid =cats( 'UID', put(c,8. -l)); end;
		end;
	do rc1=h1.find(key:dev) by 0 while(rc1=0);
		rc=h.add(key:_cust,data:_cust,data: _dev,data:uid);
		if _cust ne cust then do;
		n+1;
		t(n)=_cust;
		end;
		rc1=h1.find_next();
	end;
end;
do i=1 to dim(t);
	if not missing(t(i)) then 
	do while(hh.next()=0);
	if t(i)=_cust then rc=h.add(key:_cust,data:_cust,data: _dev,data:uid);
	end;
end;
if l then h.output(dataset:'w');
run;

/*Final Want	*/
proc sort data=w out=want nodupkey;
by uid __cust __dev;
run;

 

 

kilo_foxtrot
Calcite | Level 5

I appreciate your devotion to solving this problem @novinosrin. In my testing I try to "break" your code to help debug by adding/removing all kinds of customers and devices while knowing what to expect ahead of time. Your latest answer works for the test data, but it does not work for the previous example (removing Customer A Device 5). 

 

I went back to the previous example (removing Customer A Device 5) that resulted in UID1 and UID2.  I then added a new customer (Y) that is linked to UID1 via Device 7 and also linked to UID2 via Device 5.  This should create a link among all records (all UID1) but again it creates extra records and assigns multiple UIDs.  N=13 records in ‘have’, N=18 records in ‘want’.

 

 

 

data have;
length cust dev $5;
input cust $ dev $;
datalines;
a 1
a 2
a 7
b 2
b 3
c 3
d 4
e 5
h 2
x 4
x 5
y 7
y 5
;
run;

Again, your help is much appreciated! I'm sure it is tough to make robust since the test data provided probably does not offer enough data to thoroughly validate. 

novinosrin
Tourmaline | Level 20

Hmm, i understand the bigger complexity now after staring at the example. Let me deal with simple piecemeal approach and respond. 

novinosrin
Tourmaline | Level 20

 Absolutely right sir @Kurt_Bremser .  The idea is in my mind, just the references/syntax causes my eyes more strain as the evening unfolds. Excellent observation!

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 18 replies
  • 4434 views
  • 0 likes
  • 4 in conversation