A colleague is getting some hospitalized patients' length of stay (LOS) and costs from each hosptal in a strange format. Below is a fictional example:
There's the hospital's ID, followed by the three patients with the longest LOS, and some lines with various total costs for the patients at that hospital.
STM | PTID | LOS | Vstay | Zstay | |
204 | George | 8.81 | . | 3.89 | |
996 | Mary | 6.62 | 2 | 8.32 | |
920 | Bill | 1.12 | 3.16 | . | |
IndCost | $38,989.00 | K | |||
Pharm | $5,602 | D | |||
TotCost | $101,602.00 | K | |||
UMD | PTID | LOS | Vstay | Zstay | |
728 | Phil | 10.96 | 4.2 | 56.3 | |
299 | Sue | 7.77 | . | 6.11 | |
5 | John | 6.01 | 3.2 | . | |
IndCost | $11,999.00 | H | |||
TotCost | $300,101.44 | D | |||
MV | $555.98 | T | |||
Pharm | $408.62 | K | |||
VasoDil | $5,602 | D |
Much of the information is extraneous to the study. What I'd like to obtain is three lines for each hospital. The first line has the hospital ID, followed by the first PTID and his/her LOS. The second line has the hospital ID, followed by the second PTID with the amount for IndCost. The third line has the hospital ID, followed by the third PTID with the amount for IndCost.
STM | 204 | 8.81 |
STM | 996 | $38,989.00 |
STM | 920 | $101,602.00 |
UMD | 728 | 10.96 |
UMD | 299 | $11,999.00 |
UMD | 5 | $300,101.44 |
I can't figure out how to input data that is this weird, so I'm hoping one of you can help.
Thanks!!!!
Andrew
I can read your words, but I don't see which of the letters in the table is hospital ID, further explanation is needed. Similarly, I would like a better explanation of the logic of how anyone could start with your first table and wind up with the second table.
In addition, in your second table, you have some values which are dollars in the same column as LOS. This is always something you should avoid, these should be in different columns for SAS to work most efficiently.
I agree about not mixing data types, so please ignore the dollar signs in the second table.
The hospital ID is in the first position of the data chunk (STM, UMD, etc..)
Repeating: Similarly, I would like a better explanation of the logic of how anyone could start with your first table and wind up with the second table.
Also, the dollar signs have nothing to do with my objection about the layout of the 2nd table. LOS and Dollars do not belong in the same column.
I realize that it's hard to follow what's happening, so I've color-coded the data as well as show where the hospital id is located.
Is this how the data is always received? What does the data look like opened in a text editor? The same as what you have posted or different? If the data is genuinely being supplied in a new format then I would be going back to the provider to find out why.
@DocMartin As you can see, there are a number of people not sure what to do here.
One of my concerns is that two examples of this bizarre layout of data may not be enough to cover all possible examples. And so the code that we provide based upon these two examples may fail on other data in this bizarre layout. Never a good approach.
This is why I ask for "logic" rather than "examples". Logic is a set of rules that will allow us to write code that could work for ALL possible input data. Logic > examples.
I've spoken with my colleague about his absurd data formatting. He said he'll try to fix it. So, for now, this task is moot.
Thanks to all who took the time to respond.
You pasted the data into your question as a table. Does that mean it is coming to you in a spreadsheet? Is it a .xlsx file (current standard) or the older .xls file?
Or did the data come to you in a TEXT file? Perhaps it was a delimited text file, such as a .csv file, and your PC automatically open it with a spreadsheet program instead of text file? If is a text file then share some lines of the text file (with appropriate redactions).
Are there always three and only ever three patients per hospital?
Why does one hospital have MV and VASODIL and the other does not?
Can you explain why it make sense to only keep the first patients LOS. And why it makes sense to associate the second patient id with the INDCOST and the third patient with the PHARM costs.
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.