BookmarkSubscribeRSS Feed
kawabongga
Calcite | Level 5

Hi community !

I'm stuck on a problem of association. My data looks like :

clientMasterclient_1Masterclient_2Masterclient_3
aa  
bab 
c  e
d b 
e  e

 

And I would like to add a column which make the relation between my observations. So it would be something like that :

 

clientMasterclient_1Masterclient_2Masterclient_3MasterClient
aa  a
bab a
c  ee
d b a
e  ee

 

 

First I began by taking the first column filled but there is a mistake.
The observation "d" would have an "e" in MasterClient instead of an "a" like the others of his group.

 

Do you have an idea for solving that problem?

 

An idea I'm exploring is to fill the blanks when it's possible. Like that I could be able to use my first method.

 

You can answer in french or english. Thank you !

4 REPLIES 4
PeterClemmensen
Tourmaline | Level 20

I don't understand the logic here.

 

Why is MasterClient='a' in the fourth observation?

kawabongga
Calcite | Level 5

It's an "a" because the first observation is related to the 2nd and the 2nd to the fourth. I want the same id for all of them.

Reeza
Super User

It's a Type II dimension with a very bad starting data structure. You need recursion or a tree search and need to reform your data to make this work. 

 

Is this how your data is stored originally? Type II dimensions are typically in a long format.

https://en.wikipedia.org/wiki/Slowly_changing_dimension

 

If you resturcture your data to have a start and end, this macro will create the 'groups' for you so that you can uniquely identify each person. 

 

https://gist.github.com/statgeek/14e3aa2a9f718f551cd98134e9ceed30

 

 

 

 

Ksharp
Super User

First of all , you need to know the top Master .

Here I made a table to contain it.


data ancient;
 set have;
 if _start=_end;
run;

After that the following code could give you a start.

 


 


data temp;
infile cards expandtabs ;
input client $ Masterclient_1 $ Masterclient_2 $ Masterclient_3 $;
cards;
a	a . .	 	 
b	a	b	. 
c	 .	 .	e
d	 .	b	 .
e	 .	. 	e
;
run;
data have;
 set temp;
 array x{*} $ Masterclient: ;
 _end=client;
 do i=1 to dim(x);
   if not missing(x{i}) then do;_start=x{i};output; end;
 end;
 keep _start _end;
run;




data ancient;
 set have;
 if _start=_end;
run;



data want(keep=path);
if _n_ eq 1 then do;
length path _path  $ 400 ;
if 0 then set have;
declare hash ha(hashexp:20,dataset:'have(where=(_start is not missing and _end is not missing))',multidata:'y');
ha.definekey('_start');
ha.definedata('_end');
ha.definedone();

declare hash pa(ordered:'y');
declare hiter hi_path('pa');
pa.definekey('n');
pa.definedata('n','path');
pa.definedone();

end;


set ancient;
count=1;n=1;_n=1;
path=catx('|',_start,_end);
   
pa.add();
do while(hi_path.next()=0);
 if n ne 1 then pa.remove(key:_n);_n=n;
 _path=path;  
 _start=scan(path,-1,'|');
 rc=ha.find();if rc ne 0 then output;
 do while(rc=0);
  if not findw(path,strip(_end),'|') then do;
   if length(path)+length(_end)+1 gt lengthc(path) then do;
    putlog 'ERROR: The length of path and _path are set too short';
    stop;
   end;
   
   count+1;n=count;
   path=catx('|',path,_end);
   pa.add();
   path=_path;
 end;
  rc=ha.find_next();
end;
end;
pa.clear();
run;
data want;
 set want;
 start=scan(path,1,'|');
 end=scan(path,-1,'|');
 count=countw(path,'|')-1;
run;

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 1377 views
  • 0 likes
  • 4 in conversation