Hi, there.
Just curious how to achieve this in sas. Basically I want to tract each id's full life cycle from the origin to destination as below example shows:
have: | ||
ID | from | to |
1 | a | b |
1 | b | c |
1 | c | d |
2 | e | f |
2 | f | g |
want: | ||||
ID | from | to/from | to/from | to/from |
1 | a | b | c | d |
2 | e | f | f | g |
Any comment is highly appreciated.
why is f missing if id 2 traveled from e to f and then from f to g?
Sorry it was a typo. Thanks for pointing out!
Do you have sas/ets (sorry, I meant SAS/OR)? It includes proc optnet, which will directly solve your problem.
Run
proc setinit;
run;
and see if sas/ets sas/or is listed.
It looks like I have sas/or.
Never use it before, please guide me.
Thanks.
Yes look for OR as Mark suggested.
But just to clear the air a bit:
I would have thought the logical result to be
want: | ||||
ID | from | to/from | to/from | to/from |
1 | a | b | c | d |
2 | e | f | g |
instead of
want: | ||||
ID | from | to/from | to/from | to/from |
1 | a | b | c | d |
2 | e | f | f | g |
Can you correct my understanding?
Ok Thanks. Let's together wait if Mark has the time to demo or lead us with some guidance. I am excited about OR too. Thankfully I haven't lost much hair after having learned base sas fairly well. Otherwise your req is a simple double proc transpose
Having used proc optnet for other from/to linkage tasks, I thought this problem would be convenient to solve. But it's not:
So here's two step solution
EDITTED NOTE: To accommodate a single record ID, a node value has to be arbitrarily assigned prior to the triple.find() method. That statement ("if missing(node) then node=from;") is included below:
data have ;
input ID from :$1. to :$1.;
datalines;
1 b c
1 c d
1 a b
2 e f
2 f g
run;
data need (keep=id nodename node);
set have;
by id;
if _n_=1 then do;
length prior node next $1;
declare hash triples();
triples.definekey('id','node');
triples.definedata('id','node','prior','next');
triples.definedone();
end;
/*Populate triples hash */
if triples.find(key:id,key:from)^=0 then call missing(prior);
triples.replace(key:id,key:from,data:id,data:from,data:prior,data:to);
if triples.find(key:id,key:to)^=0 then call missing(next);
triples.replace(key:id,key:to,data:id,data:to,data:from,data:next);
if last.id;
triples.output(dataset:'triples');
if missing(node) then node=from; /*Added to accommodate single-link ID's*/
triples.find();
if not missing(prior) then do until (missing(prior));
triples.find(key:id,key:prior);
end;
nodename='Source'; /* Output nodes from source to sink */
output;
if not missing(next) then do seq=2 by 1 until(missing(next));
triples.find(key:id,key:next);
if missing(next) then nodename='Sink';
else nodename=cats('Node_',seq);
output;
end;
run;
proc transpose data=need out=want (drop=_name_) ;
by id;
var node;
id nodename;
run;
The only part of the original request unsatisfied here is that some internal values (such as NODE_3 for id group 2) are missing. That should be easy to fix, if really necessary.
Thanks a lot for the explanation. Hash is absolutely new to me and just test the code but give an error when I am adding new case: there only contains source and sink node as id3? Would you please show me how can i modify the code accordingly? Thanks!!!
data have ; input ID from :$1. to :$1.; datalines; 1 b c 1 c d 1 a b 2 e f 2 f g 3 x y run;
Please show the log containing the error message and related source code.
data have ; input ID from :$1. to :$1.; datalines; 1 b c 1 c d 1 a b 2 e f 2 f g 3 x y run; data need (keep=id nodename node); set have; by id; if _n_=1 then do; length prior node next $1; declare hash triples(); triples.definekey('id','node'); triples.definedata('id','node','prior','next'); triples.definedone(); end; /*Populate triples hash */ if triples.find(key:id,key:from)^=0 then call missing(prior); triples.replace(key:id,key:from,data:id,data:from,data:prior,data:to); if triples.find(key:id,key:to)^=0 then call missing(next); triples.replace(key:id,key:to,data:id,data:to,data:from,data:next); if last.id; triples.output(dataset:'triples'); triples.find(); if not missing(prior) then do until (missing(prior)); triples.find(key:id,key:prior); end; nodename='Source'; /* Output nodes from source to sink */ output; if not missing(next) then do seq=2 by 1 until(missing(next)); triples.find(key:id,key:next); if missing(next) then nodename='Sink'; else nodename=cats('Node_',seq); output; end; run; proc transpose data=need out=want (drop=_name_) ; by id; var node; id nodename; run;
basically I create a new case with id3, there is only two nodes x and y. The error in the log file shows:
data have;
input ID from $ to $ ;
rename from=_start to=_end;
cards;
1 a b
1 b c
1 c d
2 e f
2 f g
;
run;
proc sql;
create table x as
select * from have
where catx(' ',id,_start) not in
(select catx(' ',id,_end) from have);
quit;
data want(keep=id path);
if _n_ eq 1 then do;
length path _path $ 700 ;
if 0 then set have;
declare hash ha(hashexp:20,dataset:'have(where=(_start is not missing and _end is not missing))',multidata:'y');
ha.definekey('_start');
ha.definedata('_end');
ha.definedone();
declare hash pa(ordered:'y');
declare hiter hi_path('pa');
pa.definekey('n');
pa.definedata('n','path');
pa.definedone();
end;
set x;
count=1;n=1;_n=1;
path=catx('|',_start,_end);
pa.add();
do while(hi_path.next()=0);
if n ne 1 then pa.remove(key:_n);_n=n;
_path=path;
_start=scan(path,-1,'|');
rc=ha.find(); if rc ne 0 then output;
do while(rc=0);
if not findw(path,strip(_end),'|') then do;
if length(path)+length(_end)+1 gt lengthc(path) then do;
putlog 'ERROR: The length of path and _path are set too short';
stop;
end;
count+1;n=count;
path=catx('|',path,_end);
pa.add();
path=_path;
end;
else output;
rc=ha.find_next();
end;
end;
pa.clear();
run;
Thanks a lot! I tested the code, it gives me same id with various different path. I will need to closely investigate my data source.
To OP ,Mark, Xia who may have a couple of mins to clarify plz
I had this question printed and showed that to a friend of mine who is majoring in data science.
He pointed out. Isn't it always the following L shape that's the flow
have: | ||
ID | from | to |
1 | a | b |
1 | b | c |
1 | c | d |
2 | e | f |
2 | f | g |
Can you provide some intuition?
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.