Hi community !
I'm stuck on a problem of association. My data looks like :
client | Masterclient_1 | Masterclient_2 | Masterclient_3 |
a | a | ||
b | a | b | |
c | e | ||
d | b | ||
e | e |
And I would like to add a column which make the relation between my observations. So it would be something like that :
client | Masterclient_1 | Masterclient_2 | Masterclient_3 | MasterClient |
a | a | a | ||
b | a | b | a | |
c | e | e | ||
d | b | a | ||
e | e | e |
First I began by taking the first column filled but there is a mistake.
The observation "d" would have an "e" in MasterClient instead of an "a" like the others of his group.
Do you have an idea for solving that problem?
An idea I'm exploring is to fill the blanks when it's possible. Like that I could be able to use my first method.
You can answer in french or english. Thank you !
I don't understand the logic here.
Why is MasterClient='a' in the fourth observation?
It's an "a" because the first observation is related to the 2nd and the 2nd to the fourth. I want the same id for all of them.
It's a Type II dimension with a very bad starting data structure. You need recursion or a tree search and need to reform your data to make this work.
Is this how your data is stored originally? Type II dimensions are typically in a long format.
https://en.wikipedia.org/wiki/Slowly_changing_dimension
If you resturcture your data to have a start and end, this macro will create the 'groups' for you so that you can uniquely identify each person.
https://gist.github.com/statgeek/14e3aa2a9f718f551cd98134e9ceed30
First of all , you need to know the top Master .
Here I made a table to contain it.
data ancient;
set have;
if _start=_end;
run;
After that the following code could give you a start.
data temp;
infile cards expandtabs ;
input client $ Masterclient_1 $ Masterclient_2 $ Masterclient_3 $;
cards;
a a . .
b a b .
c . . e
d . b .
e . . e
;
run;
data have;
set temp;
array x{*} $ Masterclient: ;
_end=client;
do i=1 to dim(x);
if not missing(x{i}) then do;_start=x{i};output; end;
end;
keep _start _end;
run;
data ancient;
set have;
if _start=_end;
run;
data want(keep=path);
if _n_ eq 1 then do;
length path _path $ 400 ;
if 0 then set have;
declare hash ha(hashexp:20,dataset:'have(where=(_start is not missing and _end is not missing))',multidata:'y');
ha.definekey('_start');
ha.definedata('_end');
ha.definedone();
declare hash pa(ordered:'y');
declare hiter hi_path('pa');
pa.definekey('n');
pa.definedata('n','path');
pa.definedone();
end;
set ancient;
count=1;n=1;_n=1;
path=catx('|',_start,_end);
pa.add();
do while(hi_path.next()=0);
if n ne 1 then pa.remove(key:_n);_n=n;
_path=path;
_start=scan(path,-1,'|');
rc=ha.find();if rc ne 0 then output;
do while(rc=0);
if not findw(path,strip(_end),'|') then do;
if length(path)+length(_end)+1 gt lengthc(path) then do;
putlog 'ERROR: The length of path and _path are set too short';
stop;
end;
count+1;n=count;
path=catx('|',path,_end);
pa.add();
path=_path;
end;
rc=ha.find_next();
end;
end;
pa.clear();
run;
data want;
set want;
start=scan(path,1,'|');
end=scan(path,-1,'|');
count=countw(path,'|')-1;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.