New Contributor
Posts: 2

# Association between observations

[ Edited ]

Hi community !

I'm stuck on a problem of association. My data looks like :

 client Masterclient_1 Masterclient_2 Masterclient_3 a a b a b c e d b e e

And I would like to add a column which make the relation between my observations. So it would be something like that :

 client Masterclient_1 Masterclient_2 Masterclient_3 MasterClient a a a b a b a c e e d b a e e e

First I began by taking the first column filled but there is a mistake.
The observation "d" would have an "e" in MasterClient instead of an "a" like the others of his group.

Do you have an idea for solving that problem?

An idea I'm exploring is to fill the blanks when it's possible. Like that I could be able to use my first method.

You can answer in french or english. Thank you !

PROC Star
Posts: 1,209

## Re: Association between observations

I don't understand the logic here.

Why is MasterClient='a' in the fourth observation?

New Contributor
Posts: 2

## Re: Association between observations

It's an "a" because the first observation is related to the 2nd and the 2nd to the fourth. I want the same id for all of them.

Super User
Posts: 23,224

## Re: Association between observations

It's a Type II dimension with a very bad starting data structure. You need recursion or a tree search and need to reform your data to make this work.

Is this how your data is stored originally? Type II dimensions are typically in a long format.

https://en.wikipedia.org/wiki/Slowly_changing_dimension

If you resturcture your data to have a start and end, this macro will create the 'groups' for you so that you can uniquely identify each person.

https://gist.github.com/statgeek/14e3aa2a9f718f551cd98134e9ceed30

Super User
Posts: 10,681

## Re: Association between observations

First of all , you need to know the top Master .

Here I made a table to contain it.

data ancient;
set have;
if _start=_end;
run;

After that the following code could give you a start.

data temp;
infile cards expandtabs ;
input client \$ Masterclient_1 \$ Masterclient_2 \$ Masterclient_3 \$;
cards;
a	a . .
b	a	b	.
c	 .	 .	e
d	 .	b	 .
e	 .	. 	e
;
run;
data have;
set temp;
array x{*} \$ Masterclient: ;
_end=client;
do i=1 to dim(x);
if not missing(x{i}) then do;_start=x{i};output; end;
end;
keep _start _end;
run;

data ancient;
set have;
if _start=_end;
run;

data want(keep=path);
if _n_ eq 1 then do;
length path _path  \$ 400 ;
if 0 then set have;
declare hash ha(hashexp:20,dataset:'have(where=(_start is not missing and _end is not missing))',multidata:'y');
ha.definekey('_start');
ha.definedata('_end');
ha.definedone();

declare hash pa(ordered:'y');
declare hiter hi_path('pa');
pa.definekey('n');
pa.definedata('n','path');
pa.definedone();

end;

set ancient;
count=1;n=1;_n=1;
path=catx('|',_start,_end);

do while(hi_path.next()=0);
if n ne 1 then pa.remove(key:_n);_n=n;
_path=path;
_start=scan(path,-1,'|');
rc=ha.find();if rc ne 0 then output;
do while(rc=0);
if not findw(path,strip(_end),'|') then do;
if length(path)+length(_end)+1 gt lengthc(path) then do;
putlog 'ERROR: The length of path and _path are set too short';
stop;
end;

count+1;n=count;
path=catx('|',path,_end);
path=_path;
end;
rc=ha.find_next();
end;
end;
pa.clear();
run;
data want;
set want;
start=scan(path,1,'|');
end=scan(path,-1,'|');
count=countw(path,'|')-1;
run;
Discussion stats
• 4 replies
• 153 views
• 0 likes
• 4 in conversation