BookmarkSubscribeRSS Feed
novicenoice
Calcite | Level 5

I have 1 reference table and 2 different tables. The reference table holds the ID key and unique records.

 

The 2 different tables hold the same ID key column (for merging later), and other necessary columns. However, there are duplicate records with differing data per ID key per column. Examples below

 

Eventually, with these flags, I'd then merge all 3 tables and determine if a record has valid data for each column of interest and by hospital.

 

Reference table

ID keyMinutesHospital
14A
21B
38A
46C
59F

 

Table 2

ID keyScore
1 
1 
13
22
3 
41
4 
5 
5 

 

Table 3

ID keyScore 2
1Positive
2N/A
2Positive
2N/A
3Negative
3 
3Negative
4 
4 
5Non-conclusive
5Non-conclusive

 

In Tables 2 and 3, I want to create flags for

  • If the data element is numeric (Table 2), I want to create a flag (1, 0) where Score 1s are anything but missing.
  • If the data element is categorical (Table 3), I want to create a flag (1, 0) where Score 2s were recorded (ie "Negative", "Positive", "Non-conclusive"). It doesn't matter to me what the score is, just that there was a score

Table I want:

Want

ID keyMinutesHospitalScoreScore 2
14A11
21B11
38A01
46C10
59F01

 

Thoughts on how to do this? I've merged all 3 tables first but that creates other issues (eg if table 2 has 2 duplicate records and table 3 has 4 duplicate records, I'm creating 8 records). Each table actually has several numeric and categorical columns of interest.

 

For an additional challenge, some outcomes depend on two different columns across two different tables (eg if Score 1 is valid and if Score 2 is valid)...

 

Tips and thoughts appreciated, thank you!

 

4 REPLIES 4
ballardw
Super User

My first thought is that you have not defined a clear set of rules of how table 2 and table 3 (and any other table) should be combined. Your shown Idkey seems not to sufficiently identify things that should join from your limited description. Are there perhaps other values floating around such as date or particular service involved that would make combining things more of a 1 to 1 match (or 1 to none) between table 2 and 3.

 

Or are you not quite describing the issue such as you want "if ANY of the Idkey value has a 'score' "? Which would mean summarize the data to get that before combining the results.

 

Note: Reference table information at this stage isn't adding to the complexity and could be ignored as there are many ways to 1-to-1 lookup later based on the Idkey values as needed.

novicenoice
Calcite | Level 5

For this particular dataset, the reference table contains incident-specific information for each ID key. So the ID key is unique and the matching ID key in all other tables are created from that ID key. The reason for duplicate records is if Score1 or Score2, for example, have multiple entries depending on who and when the record was documented.

 

In my mind, in creating flags, my final table will be Table 1 with columns containing the flags. Maybe my approach isn't the most appropriate approach...?

ballardw
Super User

@novicenoice wrote:

For this particular dataset, the reference table contains incident-specific information for each ID key. So the ID key is unique and the matching ID key in all other tables are created from that ID key. The reason for duplicate records is if Score1 or Score2, for example, have multiple entries depending on who and when the record was documented.

 

In my mind, in creating flags, my final table will be Table 1 with columns containing the flags. Maybe my approach isn't the most appropriate approach...?


I am not sure quite where you are going with this but it sounds like your tables with the Scores should include one or more additional variables that indicate why there are duplicate Id Key values. That may be a "who" or a "when" or maybe both if the same "who"  is associated with entering or assigning the scores.  The additional identification variables should allow matching up the related score1, score2 (etc) that should be together for anything requiring both, or multiple, scores.

 

Sometimes the data is just ugly and requires more work than you expected as you dig into it.

Get the rules for how the "multiple" scores are to be combined for use though. As I said earlier, once the multiple cases are organized then add the reference data will be easy.

 

 

Quentin
Super User

This is hard to follow.  But I would suggest starting by selecting the records you want from table 2 (non-missing values for score) and de-duplicating it.  Then would select the records you want from table 3 ( Score2 IN("Negative", "Positive", "Non-conclusive") ) and de-duplicating it.  

 

Then you would have three files each with a unique value, and you can merge them together without worrying about duplicates.  If you want the flags, you can make them at that point.

BASUG is hosting free webinars Next up: Don Henderson presenting on using hash functions (not hash tables!) to segment data on June 12. Register now at the Boston Area SAS Users Group event page: https://www.basug.org/events.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 266 views
  • 0 likes
  • 3 in conversation