- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi everyone,
I'm getting missing values after combining these 3 dataset as seen on the attached photos. There are no missing values in the key variables. I really appreciate anyone's help!
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Several possibilities that I can think of
- Data set A (content_) and data set C (reaction_types) do not have matching values of variable named TYPE. Could be spelling mismatches, or capitalization mismatches, or other types of mismatches.
- Variables Sentiment and Score are missing in data set C
Naturally, we're just guessing. We wouldn't have to guess (a much better approach, IMHO) if you showed us parts of data set A (content_) and data set C (reaction_types) that you think should match.
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for replying! just realized that Data set A (content_) and data set C (reaction_types) do not have matching values in type as you mention. It's actually reaction_ and reaction_types that have matching values in type. Any tips on how to proceed at this point? here is the other part of the dataet
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@Mariloud wrote:
Thanks for replying! just realized that Data set A (content_) and data set C (reaction_types) do not have matching values in type as you mention. It's actually reaction_ and reaction_types that have matching values in type. Any tips on how to proceed at this point? here is the other part of the dataet
I guess you need to show us what should match. (And I don't really grasp what this screen capture has to do with the problem)
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Variables you're joining on, content_id or type have records in one data set but not the other exclusively. Since you're showing Sentiment and Score that are missing that means that Type is likely not the same in one data set as the other. This could be due to variables not being exactly the same (eg. "001" is not the same as "0001" or "1") or just missing from one data set.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
thanks for replying! Data set A (content_) and data set C (reaction_types) do not actually have matching, it's in reaction_ an reaction_types that have matching values in type. And I want to join them all 3.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
When you use a construct such as
From dataset1 as a
left join
dataset2 as b
on a.variable=b.variable
you have requested ALL the records from A and data from B that passes the ON clause. So every record in A that does not have a match in B will have all the variable from B with missing values.
If you do not want that behavior you can either use a different type of Join or add a where clause to require output records to have something present in the B data set possibly something like
Where not missing(b.variable)
Depends exactly on what you want in the final output. When you start left joining multiple data sets the conditions may get complicated depending on your join conditions.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Maxim 3: Know Your Data.
Check the key values from the resulting observations with missing values and inspect the lookup datasets for the keys you think should match, then decide how to proceed.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for your reply! I am joining 3 dataset and just realized that Data set A (content_) and data set C (reaction_types) do not have matching, only reaction_ and reaction_types do in type.