Hi. I'm trying to use a sankey diagram to visualize and calculate a flow of patients between 5 different treatment centers. Data data is structured as follows ID TreatmentCenter Contact_number TransactionID 1 A 1 111 1 B 2 111 1 A 3 111 1 A 1 222 1 B 2 222 2 C 1 333 2 D 2 333 2 A 3 333 where ID is a uniqe identifier for each individual, TreatmentCenter is the place where the patient has been treated, Contact_number is the order of which treatments were carried out. If a new treatment episode i started for the same ID, contact_number is reset to 1 and transactionID is changed. TransactionID is used to group transactions. For the Sankey diagram I use: Event = TreatmentCenter Sequence Order = Contact_number Transaction Identifier = TransactionID I want to use the diagram to see how patients flow between treatment centers and to calculate for example how many instances with contact number 2, treated in treatment center B, were treated before in in treatment center A. Somethinng like this: Treatment Center A Contact number 2 Contact number 1 to Treatment center A Treatment center B Treatment center C Treatment center D Treatment center E Treatment Center B Contact number 2 Contact number 1 to Treatment center A Treatment center B Treatment center C Treatment center D Treatment center E The problem I run in to is that I have to much data, around 20 000 000 rows and the longest path is 2 800 contact numbers long. The most frequents paths are 1-5 contact numbers long. Visual analytics will only show the top 300 most frequent paths. Data is thus missing. Is it possible to do this calculation in an other way or in some way get the sankey diagram to work? I don't have to see the longest paths, only have them in the "summation". Kind regards
... View more
Hi, I have searched the forum for help for my problem, and I think i have an idea how to solve it, but I need some more help. I would really appreciate if someone out there could help me. I am new to SAS EG and have limited SAS programming skills. I do however have a good understanding of general code. My data set is structured as follows: Occurrence ID (unique) Person ID Date Calculated column 1 x1 y1 n1 x2 y2 n2 x3 y1 n3 x4 y1 n4 I want to populate the calculated column according to how many days there are between two occurrences for a unique person. For example: For person ID = y1. Days between occurrence x3 and x1 = n3 - n1. The value will thereafter either set a 1, 2, 3 or 4 in the calculated column according to how many days there are. My idea is to transpose the data set as follows: Person ID (unique) x1 x2 x3 x4 Calculated column x1 Calculated column x2 ... y1 n1 n2 n4 y2 n3 I can thereafter calculate by comparing columns x1,...xn for each person ID. After this step, I need to transpose the data set back in order to have a unique calculated column values for each Occurrence ID. Is this a good approach or do any of you have a better idea how I might attack this problem? Thank you!
... View more