BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
michan22
Quartz | Level 8

Dear all,

 

I have 2 datasets that I would like to merge.

dataset1 has 137 observations and 7 variables.

dataset2 has 417 observations and 165 variables.

 

after sorting both dataset, I put:

 

data hpvmerge;

merge dataset1 (in=a) dataset2 (in=b);

by study_id;

if a=1;

run;

 

SAS did not give any errors and there was 137 observations and 171 variables in hpvmerge dataset.

however, when I opened this merged dataset, all values for the variables from the larger dataset (dataset2) are missing.

I am not sure what went wrong and would like to get some help from you guys.

 

Thanks in advance.

 

1 ACCEPTED SOLUTION

Accepted Solutions
WarrenKuhfeld
Rhodochrosite | Level 12

use the left function to left justify in a separate step before the merge.

 

 

x = left(x);

 

substitute your var name for x.

View solution in original post

12 REPLIES 12
WarrenKuhfeld
Rhodochrosite | Level 12

Check to see if study_id has the same values in the two data sets. I think you will find they do not.

Also, set b2 = b;  see if b2 has any nonzero values.  b is going to be automatically dropped.

ballardw
Super User

With this bit in your code:

if a=1;

you only records that have a value for study_id in dataset1. If there are no matching values for study_id in dataset2 then none of the variables would get values from dataset2 but the variables will exist.

 

You should run proc freq or some other summary on the variable study_id in both sets. Then see if there should be matches. If they aren't matching it may be a matter of 1) no matches or presumptive matches

2) letter case: "Study a" does not match "Study A",

3) extra characters: "Study  A" does not match "Study A" (the first has two spaces)

4) different characters: "Study_A" does not match "Study A"

5) the insidious leading blank: " Study A" does not match "Study A"

If the study id is numeric it there may be a formatting issue such that when you look at them they appear identical but there is an actual difference in the underlying value. 4.3 displayed with an F1 format will look like 4 but not match the actual value 4 in the other set.

michan22
Quartz | Level 8

Than you so much for the suggestion.

So I have ran proc freq on both datasets and there are definitely matching values.

Study id in dataset1 have both numbers only ID and numbers+letters ID, and it is a character variable with a length of 5.

but study id in dataset2 have only numbers ID, (numerical variable with length 8, but actually it ranges from only 1 digit to 4 digit ID), so I had to convert to a character variable with a length of $5 before merging. 

If I understood your suggestions correctly, did this cause the missing values in the merged dataset??

WarrenKuhfeld
Rhodochrosite | Level 12

Right.  If the values do not match, you will not get any observations from dataset2.

michan22
Quartz | Level 8

Thank you.

Can you give some suggestions on how I should fix that?

I can't think of another way of doing this (e.g. convert to character via the put function and make it the same length as dataset1 study ID). I need to keep those IDs with numbers+letters too in dataset1 as I might be linking to a third dataset with this kind of ID format.

 

WarrenKuhfeld
Rhodochrosite | Level 12

You can keep the original variables, but you must make a variable in each data set that has the same names and values and then merge on it.

michan22
Quartz | Level 8

Sorry I did not make this clear before. The character variable I converted to in dataset2 have the same variable name as study_id in dataset1 (both are named study_id). The original numeric ID variable was named something else.

Should I make a new variable in dataset1 as well and just have the values equal to study_id and link this to dataset2?

WarrenKuhfeld
Rhodochrosite | Level 12

If the merge is still not working the way you want, run proc freq on the BY variable for both data sets.  Ensure that the values in each data set that should match do in fact match.  If they don't transform them in whatever way is necessary to get the desired matches.

michan22
Quartz | Level 8

After comparing side by side, I am finally seeing what's wrong:

dataset1 study ID (already a character variable, just picked a few for example)

1

10

112

 

 

and dataset2 study ID (after converting to character)

    1

  10

112

 

Why is this happening and how should I fix??

WarrenKuhfeld
Rhodochrosite | Level 12

use the left function to left justify in a separate step before the merge.

 

 

x = left(x);

 

substitute your var name for x.

michan22
Quartz | Level 8

It worked!! Thank you so much I have been trying to figure this out for the whole day!

WarrenKuhfeld
Rhodochrosite | Level 12

Glad I could help!

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 12 replies
  • 6282 views
  • 5 likes
  • 3 in conversation