Hi, there.
Just curious how to achieve this in sas. Basically I want to tract each id's full life cycle from the origin to destination as below example shows:
have: | ||
ID | from | to |
1 | a | b |
1 | b | c |
1 | c | d |
2 | e | f |
2 | f | g |
want: | ||||
ID | from | to/from | to/from | to/from |
1 | a | b | c | d |
2 | e | f | f | g |
Any comment is highly appreciated.
why is f missing if id 2 traveled from e to f and then from f to g?
Sorry it was a typo. Thanks for pointing out!
Do you have sas/ets (sorry, I meant SAS/OR)? It includes proc optnet, which will directly solve your problem.
Run
proc setinit;
run;
and see if sas/ets sas/or is listed.
It looks like I have sas/or.
Never use it before, please guide me.
Thanks.
Yes look for OR as Mark suggested.
But just to clear the air a bit:
I would have thought the logical result to be
want: | ||||
ID | from | to/from | to/from | to/from |
1 | a | b | c | d |
2 | e | f | g |
instead of
want: | ||||
ID | from | to/from | to/from | to/from |
1 | a | b | c | d |
2 | e | f | f | g |
Can you correct my understanding?
Ok Thanks. Let's together wait if Mark has the time to demo or lead us with some guidance. I am excited about OR too. Thankfully I haven't lost much hair after having learned base sas fairly well. Otherwise your req is a simple double proc transpose
Having used proc optnet for other from/to linkage tasks, I thought this problem would be convenient to solve. But it's not:
So here's two step solution
EDITTED NOTE: To accommodate a single record ID, a node value has to be arbitrarily assigned prior to the triple.find() method. That statement ("if missing(node) then node=from;") is included below:
data have ;
input ID from :$1. to :$1.;
datalines;
1 b c
1 c d
1 a b
2 e f
2 f g
run;
data need (keep=id nodename node);
set have;
by id;
if _n_=1 then do;
length prior node next $1;
declare hash triples();
triples.definekey('id','node');
triples.definedata('id','node','prior','next');
triples.definedone();
end;
/*Populate triples hash */
if triples.find(key:id,key:from)^=0 then call missing(prior);
triples.replace(key:id,key:from,data:id,data:from,data:prior,data:to);
if triples.find(key:id,key:to)^=0 then call missing(next);
triples.replace(key:id,key:to,data:id,data:to,data:from,data:next);
if last.id;
triples.output(dataset:'triples');
if missing(node) then node=from; /*Added to accommodate single-link ID's*/
triples.find();
if not missing(prior) then do until (missing(prior));
triples.find(key:id,key:prior);
end;
nodename='Source'; /* Output nodes from source to sink */
output;
if not missing(next) then do seq=2 by 1 until(missing(next));
triples.find(key:id,key:next);
if missing(next) then nodename='Sink';
else nodename=cats('Node_',seq);
output;
end;
run;
proc transpose data=need out=want (drop=_name_) ;
by id;
var node;
id nodename;
run;
The only part of the original request unsatisfied here is that some internal values (such as NODE_3 for id group 2) are missing. That should be easy to fix, if really necessary.
Thanks a lot for the explanation. Hash is absolutely new to me and just test the code but give an error when I am adding new case: there only contains source and sink node as id3? Would you please show me how can i modify the code accordingly? Thanks!!!
data have ; input ID from :$1. to :$1.; datalines; 1 b c 1 c d 1 a b 2 e f 2 f g 3 x y run;
Please show the log containing the error message and related source code.
data have ; input ID from :$1. to :$1.; datalines; 1 b c 1 c d 1 a b 2 e f 2 f g 3 x y run; data need (keep=id nodename node); set have; by id; if _n_=1 then do; length prior node next $1; declare hash triples(); triples.definekey('id','node'); triples.definedata('id','node','prior','next'); triples.definedone(); end; /*Populate triples hash */ if triples.find(key:id,key:from)^=0 then call missing(prior); triples.replace(key:id,key:from,data:id,data:from,data:prior,data:to); if triples.find(key:id,key:to)^=0 then call missing(next); triples.replace(key:id,key:to,data:id,data:to,data:from,data:next); if last.id; triples.output(dataset:'triples'); triples.find(); if not missing(prior) then do until (missing(prior)); triples.find(key:id,key:prior); end; nodename='Source'; /* Output nodes from source to sink */ output; if not missing(next) then do seq=2 by 1 until(missing(next)); triples.find(key:id,key:next); if missing(next) then nodename='Sink'; else nodename=cats('Node_',seq); output; end; run; proc transpose data=need out=want (drop=_name_) ; by id; var node; id nodename; run;
basically I create a new case with id3, there is only two nodes x and y. The error in the log file shows:
data have;
input ID from $ to $ ;
rename from=_start to=_end;
cards;
1 a b
1 b c
1 c d
2 e f
2 f g
;
run;
proc sql;
create table x as
select * from have
where catx(' ',id,_start) not in
(select catx(' ',id,_end) from have);
quit;
data want(keep=id path);
if _n_ eq 1 then do;
length path _path $ 700 ;
if 0 then set have;
declare hash ha(hashexp:20,dataset:'have(where=(_start is not missing and _end is not missing))',multidata:'y');
ha.definekey('_start');
ha.definedata('_end');
ha.definedone();
declare hash pa(ordered:'y');
declare hiter hi_path('pa');
pa.definekey('n');
pa.definedata('n','path');
pa.definedone();
end;
set x;
count=1;n=1;_n=1;
path=catx('|',_start,_end);
pa.add();
do while(hi_path.next()=0);
if n ne 1 then pa.remove(key:_n);_n=n;
_path=path;
_start=scan(path,-1,'|');
rc=ha.find(); if rc ne 0 then output;
do while(rc=0);
if not findw(path,strip(_end),'|') then do;
if length(path)+length(_end)+1 gt lengthc(path) then do;
putlog 'ERROR: The length of path and _path are set too short';
stop;
end;
count+1;n=count;
path=catx('|',path,_end);
pa.add();
path=_path;
end;
else output;
rc=ha.find_next();
end;
end;
pa.clear();
run;
Thanks a lot! I tested the code, it gives me same id with various different path. I will need to closely investigate my data source.
To OP ,Mark, Xia who may have a couple of mins to clarify plz
I had this question printed and showed that to a friend of mine who is majoring in data science.
He pointed out. Isn't it always the following L shape that's the flow
have: | ||
ID | from | to |
1 | a | b |
1 | b | c |
1 | c | d |
2 | e | f |
2 | f | g |
Can you provide some intuition?
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.