- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi, there.
Just curious how to achieve this in sas. Basically I want to tract each id's full life cycle from the origin to destination as below example shows:
| have: | ||
| ID | from | to |
| 1 | a | b |
| 1 | b | c |
| 1 | c | d |
| 2 | e | f |
| 2 | f | g |
| want: | ||||
| ID | from | to/from | to/from | to/from |
| 1 | a | b | c | d |
| 2 | e | f | f | g |
Any comment is highly appreciated.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
why is f missing if id 2 traveled from e to f and then from f to g?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Sorry it was a typo. Thanks for pointing out!
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Do you have sas/ets (sorry, I meant SAS/OR)? It includes proc optnet, which will directly solve your problem.
Run
proc setinit;
run;
and see if sas/ets sas/or is listed.
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set
Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets
--------------------------
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
It looks like I have sas/or.
Never use it before, please guide me.
Thanks.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Yes look for OR as Mark suggested.
But just to clear the air a bit:
I would have thought the logical result to be
| want: | ||||
| ID | from | to/from | to/from | to/from |
| 1 | a | b | c | d |
| 2 | e | f | g |
instead of
| want: | ||||
| ID | from | to/from | to/from | to/from |
| 1 | a | b | c | d |
| 2 | e | f | f | g |
Can you correct my understanding?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thanks.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Ok Thanks. Let's together wait if Mark has the time to demo or lead us with some guidance. I am excited about OR too. Thankfully I haven't lost much hair after having learned base sas fairly well. Otherwise your req is a simple double proc transpose
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Having used proc optnet for other from/to linkage tasks, I thought this problem would be convenient to solve. But it's not:
- proc optnet doesn't honor BY groups, so you'd have to run it multiple times, and then concatenate all the by-group results.
- Within each ID group you have to ask proc optnet to generate the longest path (a-d is longer than all the others). If would provide 2 such paths for ID group 1: a->b->c->d and d->c->b->a. You'd have to compare the path output file to other output files from optnet to determine which one honors the flow direction desired, and then arrange data to be ready for transpose from tall to wide. A pain you-know-where.
So here's two step solution
- In a data step
- make a hash call triple, which
- Includes ID as a key in the hash object, making honoring id groups easier
- Makes a triple for each node in the path, containing (in addition to ID) NODE, PRIOR, and NEXT
- At the end of each by group
- Find the source (the node with no prior)
- output each successive node until reaching the sink (the node with no next)
- Label the nodes as "source", "node_2", "node_3", ... to "sink".
- make a hash call triple, which
- After the data step run a proc transpose:
EDITTED NOTE: To accommodate a single record ID, a node value has to be arbitrarily assigned prior to the triple.find() method. That statement ("if missing(node) then node=from;") is included below:
data have ;
input ID from :$1. to :$1.;
datalines;
1 b c
1 c d
1 a b
2 e f
2 f g
run;
data need (keep=id nodename node);
set have;
by id;
if _n_=1 then do;
length prior node next $1;
declare hash triples();
triples.definekey('id','node');
triples.definedata('id','node','prior','next');
triples.definedone();
end;
/*Populate triples hash */
if triples.find(key:id,key:from)^=0 then call missing(prior);
triples.replace(key:id,key:from,data:id,data:from,data:prior,data:to);
if triples.find(key:id,key:to)^=0 then call missing(next);
triples.replace(key:id,key:to,data:id,data:to,data:from,data:next);
if last.id;
triples.output(dataset:'triples');
if missing(node) then node=from; /*Added to accommodate single-link ID's*/
triples.find();
if not missing(prior) then do until (missing(prior));
triples.find(key:id,key:prior);
end;
nodename='Source'; /* Output nodes from source to sink */
output;
if not missing(next) then do seq=2 by 1 until(missing(next));
triples.find(key:id,key:next);
if missing(next) then nodename='Sink';
else nodename=cats('Node_',seq);
output;
end;
run;
proc transpose data=need out=want (drop=_name_) ;
by id;
var node;
id nodename;
run;
The only part of the original request unsatisfied here is that some internal values (such as NODE_3 for id group 2) are missing. That should be easy to fix, if really necessary.
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set
Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets
--------------------------
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thanks a lot for the explanation. Hash is absolutely new to me and just test the code but give an error when I am adding new case: there only contains source and sink node as id3? Would you please show me how can i modify the code accordingly? Thanks!!!
data have ; input ID from :$1. to :$1.; datalines; 1 b c 1 c d 1 a b 2 e f 2 f g 3 x y run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Please show the log containing the error message and related source code.
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set
Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets
--------------------------
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
data have ;
input ID from :$1. to :$1.;
datalines;
1 b c
1 c d
1 a b
2 e f
2 f g
3 x y
run;
data need (keep=id nodename node);
set have;
by id;
if _n_=1 then do;
length prior node next $1;
declare hash triples();
triples.definekey('id','node');
triples.definedata('id','node','prior','next');
triples.definedone();
end;
/*Populate triples hash */
if triples.find(key:id,key:from)^=0 then call missing(prior);
triples.replace(key:id,key:from,data:id,data:from,data:prior,data:to);
if triples.find(key:id,key:to)^=0 then call missing(next);
triples.replace(key:id,key:to,data:id,data:to,data:from,data:next);
if last.id;
triples.output(dataset:'triples');
triples.find();
if not missing(prior) then do until (missing(prior));
triples.find(key:id,key:prior);
end;
nodename='Source'; /* Output nodes from source to sink */
output;
if not missing(next) then do seq=2 by 1 until(missing(next));
triples.find(key:id,key:next);
if missing(next) then nodename='Sink';
else nodename=cats('Node_',seq);
output;
end;
run;
proc transpose data=need out=want (drop=_name_) ;
by id;
var node;
id nodename;
run;basically I create a new case with id3, there is only two nodes x and y. The error in the log file shows:
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
data have;
input ID from $ to $ ;
rename from=_start to=_end;
cards;
1 a b
1 b c
1 c d
2 e f
2 f g
;
run;
proc sql;
create table x as
select * from have
where catx(' ',id,_start) not in
(select catx(' ',id,_end) from have);
quit;
data want(keep=id path);
if _n_ eq 1 then do;
length path _path $ 700 ;
if 0 then set have;
declare hash ha(hashexp:20,dataset:'have(where=(_start is not missing and _end is not missing))',multidata:'y');
ha.definekey('_start');
ha.definedata('_end');
ha.definedone();
declare hash pa(ordered:'y');
declare hiter hi_path('pa');
pa.definekey('n');
pa.definedata('n','path');
pa.definedone();
end;
set x;
count=1;n=1;_n=1;
path=catx('|',_start,_end);
pa.add();
do while(hi_path.next()=0);
if n ne 1 then pa.remove(key:_n);_n=n;
_path=path;
_start=scan(path,-1,'|');
rc=ha.find(); if rc ne 0 then output;
do while(rc=0);
if not findw(path,strip(_end),'|') then do;
if length(path)+length(_end)+1 gt lengthc(path) then do;
putlog 'ERROR: The length of path and _path are set too short';
stop;
end;
count+1;n=count;
path=catx('|',path,_end);
pa.add();
path=_path;
end;
else output;
rc=ha.find_next();
end;
end;
pa.clear();
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thanks a lot! I tested the code, it gives me same id with various different path. I will need to closely investigate my data source.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
To OP ,Mark, Xia who may have a couple of mins to clarify plz
I had this question printed and showed that to a friend of mine who is majoring in data science.
He pointed out. Isn't it always the following L shape that's the flow
| have: | ||
| ID | from | to |
| 1 | a | b |
| 1 | b | c |
| 1 | c | d |
| 2 | e | f |
| 2 | f | g |
Can you provide some intuition?