I have 1 reference table and 2 different tables. The reference table holds the ID key and unique records.
The 2 different tables hold the same ID key column (for merging later), and other necessary columns. However, there are duplicate records with differing data per ID key per column. Examples below
Eventually, with these flags, I'd then merge all 3 tables and determine if a record has valid data for each column of interest and by hospital.
Reference table
ID key | Minutes | Hospital |
1 | 4 | A |
2 | 1 | B |
3 | 8 | A |
4 | 6 | C |
5 | 9 | F |
Table 2
ID key | Score |
1 | |
1 | |
1 | 3 |
2 | 2 |
3 | |
4 | 1 |
4 | |
5 | |
5 |
Table 3
ID key | Score 2 |
1 | Positive |
2 | N/A |
2 | Positive |
2 | N/A |
3 | Negative |
3 | |
3 | Negative |
4 | |
4 | |
5 | Non-conclusive |
5 | Non-conclusive |
In Tables 2 and 3, I want to create flags for
Table I want:
Want
ID key | Minutes | Hospital | Score | Score 2 |
1 | 4 | A | 1 | 1 |
2 | 1 | B | 1 | 1 |
3 | 8 | A | 0 | 1 |
4 | 6 | C | 1 | 0 |
5 | 9 | F | 0 | 1 |
Thoughts on how to do this? I've merged all 3 tables first but that creates other issues (eg if table 2 has 2 duplicate records and table 3 has 4 duplicate records, I'm creating 8 records). Each table actually has several numeric and categorical columns of interest.
For an additional challenge, some outcomes depend on two different columns across two different tables (eg if Score 1 is valid and if Score 2 is valid)...
Tips and thoughts appreciated, thank you!
My first thought is that you have not defined a clear set of rules of how table 2 and table 3 (and any other table) should be combined. Your shown Idkey seems not to sufficiently identify things that should join from your limited description. Are there perhaps other values floating around such as date or particular service involved that would make combining things more of a 1 to 1 match (or 1 to none) between table 2 and 3.
Or are you not quite describing the issue such as you want "if ANY of the Idkey value has a 'score' "? Which would mean summarize the data to get that before combining the results.
Note: Reference table information at this stage isn't adding to the complexity and could be ignored as there are many ways to 1-to-1 lookup later based on the Idkey values as needed.
For this particular dataset, the reference table contains incident-specific information for each ID key. So the ID key is unique and the matching ID key in all other tables are created from that ID key. The reason for duplicate records is if Score1 or Score2, for example, have multiple entries depending on who and when the record was documented.
In my mind, in creating flags, my final table will be Table 1 with columns containing the flags. Maybe my approach isn't the most appropriate approach...?
@novicenoice wrote:
For this particular dataset, the reference table contains incident-specific information for each ID key. So the ID key is unique and the matching ID key in all other tables are created from that ID key. The reason for duplicate records is if Score1 or Score2, for example, have multiple entries depending on who and when the record was documented.
In my mind, in creating flags, my final table will be Table 1 with columns containing the flags. Maybe my approach isn't the most appropriate approach...?
I am not sure quite where you are going with this but it sounds like your tables with the Scores should include one or more additional variables that indicate why there are duplicate Id Key values. That may be a "who" or a "when" or maybe both if the same "who" is associated with entering or assigning the scores. The additional identification variables should allow matching up the related score1, score2 (etc) that should be together for anything requiring both, or multiple, scores.
Sometimes the data is just ugly and requires more work than you expected as you dig into it.
Get the rules for how the "multiple" scores are to be combined for use though. As I said earlier, once the multiple cases are organized then add the reference data will be easy.
This is hard to follow. But I would suggest starting by selecting the records you want from table 2 (non-missing values for score) and de-duplicating it. Then would select the records you want from table 3 ( Score2 IN("Negative", "Positive", "Non-conclusive") ) and de-duplicating it.
Then you would have three files each with a unique value, and you can merge them together without worrying about duplicates. If you want the flags, you can make them at that point.
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Select SAS Training centers are offering in-person courses. View upcoming courses for: