Hi
I have 2 data files that I am trying to merge by id.
Variables from one are not in the other and it is overwriting or populating the variables.
Example:
Data X;
Merge Y Z;
by ID;
Run;
Variables a b and c are only in file Y
d e and f are only in file Z.
when I merge them, variables a b and c repeats the values that were in file Y as if they were in file Z.
Why aren't they missing ? I thought they would be missing.
Is there a command that I can use to make sure they remain missing if NOT IN file Z ?
Thank you
If you want to merge then setup your data so that it can be merged.
data want;
merge a(rename=(age_a=age)) b(rename=(age_b=age)) ;
by id age;
run;
If you want more precise help then post a workable example of data that demonstrates your problem.
Thank you. I changed to merge by id and age and it worked.
Now I have another issue:
I need to be able to merge the data when the age is the same.
I am not sure if I can explain but I will try.
When I merge by 2 variables if prevents the overwrite.
But if I have data for the same age for example I have measurements and diagnoses with age 10.
They are in 2 different lines when they should be in one line only.
so ID A B C D E F AGE
1 10 5 5 . . . 14
ID
1 . . . 1 1 1 14
How do I tell sas to make this entry only one entry ?
This happens in several instances in the data.
Thank you so much
This makes no sense.
data Y;
input ID A B C;
cards;
1 12 13 14
;
data Z;
input ID D E F;
cards;
1 22 23 24
1 32 33 34
2 1 2 3
;
data X;
merge Y Z;
by ID;
run;
ID | A | B | C | D | E | F |
---|---|---|---|---|---|---|
1 | 12 | 13 | 14 | 22 | 23 | 24 |
1 | 12 | 13 | 14 | 32 | 33 | 34 |
2 | . | . | . | 1 | 2 | 3 |
No value for ABC when ID=2, since that ID is not on table Y.
In the example you gave (thank you) it is ID 1 that does not have ABC not ID2 so
so 12 13 14 on the second line of ID1 is the overwrite. because I am pulling 32 33 34 from another data file, and when I do so
ID 1 (line 2) gets 12 13 14 when it needed to be missing (like ID2).
It looks like you want
data X;
merge Y Z;
by ID;
if ^first.ID then call missing(A,B,C);
run;
ID | A | B | C | D | E | F |
---|---|---|---|---|---|---|
1 | 12 | 13 | 14 | 22 | 23 | 24 |
1 | . | . | . | 32 | 33 | 34 |
2 | . | . | . | 1 | 2 | 3 |
The original join works as any join is supposed to though: If the key is present the data for that key is used.
Clearing the data when the key is present is NOT a standard join.
Show us, in a data step:
Help us help you.
Y has height Body Weight variables
Z has diagnoses (URI ASTHMA etc) and drugs variables
X needs to have all variables from Y and Z.
It is also longitudinal so one ID may have 5 entries in File Y and 10 in File Z.
another ID might have 2 entries in File Y and 5 entries in file Z. All different ages.
The variable that they both have in common is AGE and I found a way to prevent the ages from overwriting. So if I have one entry from Z and 10 from Y for ID 1- I have 11 entries with different ages for ID 1 but then all of the diagnosis that were 1 in 1 ENTRY are repeated as 1 in all other 10 entries even though it should have been 1 only in ONE ENTRY and 0 in all the other 10 (from Y). I thought they would be missing and then I could tell sas to do if dx =. then dx=0. for example.
But with all diagnoses = 1 I don't have a way to give a command to transform it to 0. It will transform others (that must remain 1 to 0) - it will overwrite my correct data in Z. Every ID is different: different ages and different amount of entries and diagnoses.
See @mkeintz's reply.
Provide BEFORE and AFTER sample data.
@Mscarboncopy wrote:
Y has height Body Weight variables
Z has diagnoses (URI ASTHMA etc) and drugs variables
X needs to have all variables from Y and Z.
It is also longitudinal so one ID may have 5 entries in File Y and 10 in File Z.
another ID might have 2 entries in File Y and 5 entries in file Z. All different ages.
The variable that they both have in common is AGE and I found a way to prevent the ages from overwriting. So if I have one entry from Z and 10 from Y for ID 1- I have 11 entries with different ages for ID 1 but then all of the diagnosis that were 1 in 1 ENTRY are repeated as 1 in all other 10 entries even though it should have been 1 only in ONE ENTRY and 0 in all the other 10 (from Y). I thought they would be missing and then I could tell sas to do if dx =. then dx=0. for example.
But with all diagnoses = 1 I don't have a way to give a command to transform it to 0. It will transform others (that must remain 1 to 0) - it will overwrite my correct data in Z. Every ID is different: different ages and different amount of entries and diagnoses.
The purpler portion is what you expect, but your expectations are incorrect. If you want this, you need to be able to join by more than ID, ie ID + Event or another variable to uniquely identify each event. If you cannot, a merge links each record by the identified variable and repeats.
So show us a better example of what you want and we can maybe offer some suggestions to get there, but a MERGE/JOIN as shown will not get you there because your logic is more complicated.
yes.
Thank you. I changed the merge by id and age and it worked.
it is not overwriting.
Now I have another issue:
I need to be able to merge the data when the age is the same.
I am not sure if I can explain but I will try.
When I merge by 2 variables if prevents the overwrite and this was possible because I was talking about it with you here 🙂
But if I have data for the same age for example say I have measurements (from the one file) and diagnoses (from the other) with same age 10.
They are in 2 different lines when they should be in one line only.
so ID A B C D E F AGE
1 10 5 5 . . . 14
1 . . . 1 1 1 14
How do I tell sas to make this entry only one entry for those cases only?
This happens in several instances in the data.
Thank you so much
Data want;
Merge Y Z;
by id visit_type;
Run;
This gives me a data set with 13.000 observations. Multiple observations for each ID.
Different variables, different ages. Only a couple of ages match from one file to the other.
I need to make sure the ones that do match in age are only in one line instead of 2.
Variables in Y are not the same as in Z aside from id visit_type and age.
but age var has two different names (depending on the file) so it would not overwrite.
one is age_yr the other is visit_age.
I am trying to do a comparison to see which ones match but then what ?
So this way I can merge them but prevent the overwriting of the other variables... so what is not in Y has to be missing and what is not in Z has to be missing.
I don't have a data step for that, it is what I am trying to figure out.
I hope that helps.
Thanks
Mari
@Mscarboncopy wrote:
Data want;
Merge Y Z;
by id visit_type;
Run;
This gives me a data set with 13.000 observations. Multiple observations for each ID.
Different variables, different ages. Only a couple of ages match from one file to the other.
I need to make sure the ones that do match in age are only in one line instead of 2.
Variables in Y are not the same as in Z aside from id visit_type and age.
but age var has two different names (depending on the file) so it would not overwrite.
one is age_yr the other is visit_age.
I am trying to do a comparison to see which ones match but then what ?
So this way I can merge them but prevent the overwriting of the other variables... so what is not in Y has to be missing and what is not in Z has to be missing.
I don't have a data step for that, it is what I am trying to figure out.
I hope that helps.
Thanks
Mari
I do not see any usable example data, against which we could test code.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.