10-05-2017 06:20 PM
I have 2 datasets that I would like to merge.
dataset1 has 137 observations and 7 variables.
dataset2 has 417 observations and 165 variables.
after sorting both dataset, I put:
merge dataset1 (in=a) dataset2 (in=b);
SAS did not give any errors and there was 137 observations and 171 variables in hpvmerge dataset.
however, when I opened this merged dataset, all values for the variables from the larger dataset (dataset2) are missing.
I am not sure what went wrong and would like to get some help from you guys.
Thanks in advance.
10-05-2017 06:44 PM
Check to see if study_id has the same values in the two data sets. I think you will find they do not.
Also, set b2 = b; see if b2 has any nonzero values. b is going to be automatically dropped.
10-05-2017 06:50 PM
With this bit in your code:
you only records that have a value for study_id in dataset1. If there are no matching values for study_id in dataset2 then none of the variables would get values from dataset2 but the variables will exist.
You should run proc freq or some other summary on the variable study_id in both sets. Then see if there should be matches. If they aren't matching it may be a matter of 1) no matches or presumptive matches
2) letter case: "Study a" does not match "Study A",
3) extra characters: "Study A" does not match "Study A" (the first has two spaces)
4) different characters: "Study_A" does not match "Study A"
5) the insidious leading blank: " Study A" does not match "Study A"
If the study id is numeric it there may be a formatting issue such that when you look at them they appear identical but there is an actual difference in the underlying value. 4.3 displayed with an F1 format will look like 4 but not match the actual value 4 in the other set.
10-05-2017 07:36 PM
Than you so much for the suggestion.
So I have ran proc freq on both datasets and there are definitely matching values.
Study id in dataset1 have both numbers only ID and numbers+letters ID, and it is a character variable with a length of 5.
but study id in dataset2 have only numbers ID, (numerical variable with length 8, but actually it ranges from only 1 digit to 4 digit ID), so I had to convert to a character variable with a length of $5 before merging.
If I understood your suggestions correctly, did this cause the missing values in the merged dataset??
10-05-2017 08:33 PM
Can you give some suggestions on how I should fix that?
I can't think of another way of doing this (e.g. convert to character via the put function and make it the same length as dataset1 study ID). I need to keep those IDs with numbers+letters too in dataset1 as I might be linking to a third dataset with this kind of ID format.
10-05-2017 08:48 PM
Sorry I did not make this clear before. The character variable I converted to in dataset2 have the same variable name as study_id in dataset1 (both are named study_id). The original numeric ID variable was named something else.
Should I make a new variable in dataset1 as well and just have the values equal to study_id and link this to dataset2?
10-05-2017 08:51 PM
If the merge is still not working the way you want, run proc freq on the BY variable for both data sets. Ensure that the values in each data set that should match do in fact match. If they don't transform them in whatever way is necessary to get the desired matches.
10-05-2017 09:07 PM
After comparing side by side, I am finally seeing what's wrong:
dataset1 study ID (already a character variable, just picked a few for example)
and dataset2 study ID (after converting to character)
Why is this happening and how should I fix??
Need further help from the community? Please ask a new question.