BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Mscarboncopy
Pyrite | Level 9

Hi

 

I have 2 data files that I am trying to merge by id.

Variables  from one are not in the other and it is overwriting or populating the variables.

Example:

 

Data X;

Merge Y Z;

by ID;

Run;

 

Variables a b and c are only in file Y

d e and f are only in file Z.

 

when I merge them, variables a b and c repeats the values that were in file Y as if they were in file Z. 

Why aren't they missing ? I thought they would be missing.

Is there a command that I can use to make sure they remain missing if NOT IN file Z ?

Thank you

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

If you want to merge then setup your data so that it can be merged.

data want;
  merge a(rename=(age_a=age)) b(rename=(age_b=age)) ;
  by id age;
run;

If you want more precise help then post a workable example of data that demonstrates your problem.

View solution in original post

17 REPLIES 17
Reeza
Super User
You may need to change your BY logic, if you aren't joining by ID. If the ID is the same, the same information is copied over, it is not left blank.
Mscarboncopy
Pyrite | Level 9

Thank you. I changed to merge by id and age and it worked. 

Now I have another issue:

I need to be able to merge the data when the age is the same.

I am not sure if I can explain but I will try.

When I merge by 2 variables if prevents the overwrite.

But if I have data for the same age for example I have measurements and diagnoses with age 10. 

They are in 2 different lines when they should be in one line only.

so ID   A B C     D E F    AGE

        1      10 5 5    .   .   .     14

    ID

         1    .    .    .    1  1  1      14

 

How do I tell sas to make this entry only one entry ?

This happens in several instances in the data.

Thank you so much  

Reeza
Super User
Post this as a new question and post your sample data as legible as possible. As shown I wouldn't be able to tell you anything.

It's likely a two step process, first consolidate your data and then merge.

Instructions on providing sample data is here, it can be fake data but needs to be legible and representative of your problem.

https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat...
ChrisNZ
Tourmaline | Level 20

This makes no sense.

data Y; 
  input ID A B C;
cards;
1 12 13 14
;
data Z; 
  input ID D E F;
cards;
1 22 23 24
1 32 33 34
2 1 2 3
;
data X;
  merge Y Z;
  by ID;
run;

 

 

ID A B C D E F
1 12 13 14 22 23 24
1 12 13 14 32 33 34
2 . . . 1 2 3

 

No value for ABC when ID=2, since that ID is not on table Y.

 

Mscarboncopy
Pyrite | Level 9

In the example you gave (thank you) it is ID 1 that does not have ABC not ID2 so 

so 12 13 14 on the second line of ID1 is the overwrite. because I am pulling 32 33 34 from another data file, and when I do so

ID 1 (line 2) gets 12 13 14 when it needed to be missing (like ID2).

 

ChrisNZ
Tourmaline | Level 20

It looks like you want 

data X;
  merge Y Z;
  by ID;
  if ^first.ID then call missing(A,B,C);
run;
ID A B C D E F
1 12 13 14 22 23 24
1 . . . 32 33 34
2 . . . 1 2 3

 

The original join works as any join is supposed to though: If the key is present the data for that key is used.

Clearing the data when the key is present is NOT a standard join.

mkeintz
PROC Star

Show us, in a data step:

  1. What Y and Z look like
  2. What X should look like

Help us help you.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Mscarboncopy
Pyrite | Level 9

Y has height Body Weight variables 

Z has diagnoses (URI ASTHMA etc) and drugs variables 

X needs to have all variables from Y and Z.

It is also longitudinal so one ID may have 5 entries in File Y and 10 in File Z.

another ID might have 2 entries in File Y and 5 entries in file Z. All different ages.

The variable that they both have in common is AGE and I found a way to prevent the ages from overwriting. So if I have one entry from Z and 10 from Y for ID 1- I have 11 entries with different ages  for ID 1 but then all of the diagnosis that were 1 in 1 ENTRY are repeated as 1 in all other 10 entries even though it should have been 1 only in ONE ENTRY and 0 in all the other 10 (from Y). I thought they would be missing and then I could tell sas to do if dx =. then dx=0. for example. 

But with all diagnoses = 1 I don't have a way to give a command to transform it to 0. It will transform others (that must remain 1 to 0) - it will overwrite my correct data in Z. Every ID is different:  different ages and different amount of entries and diagnoses. 

 

ChrisNZ
Tourmaline | Level 20

See @mkeintz's reply.

Provide BEFORE and AFTER sample data.

Reeza
Super User

@Mscarboncopy wrote:

Y has height Body Weight variables 

Z has diagnoses (URI ASTHMA etc) and drugs variables 

X needs to have all variables from Y and Z.

It is also longitudinal so one ID may have 5 entries in File Y and 10 in File Z.

another ID might have 2 entries in File Y and 5 entries in file Z. All different ages.

The variable that they both have in common is AGE and I found a way to prevent the ages from overwriting. So if I have one entry from Z and 10 from Y for ID 1- I have 11 entries with different ages  for ID 1 but then all of the diagnosis that were 1 in 1 ENTRY are repeated as 1 in all other 10 entries even though it should have been 1 only in ONE ENTRY and 0 in all the other 10 (from Y). I thought they would be missing and then I could tell sas to do if dx =. then dx=0. for example. 

But with all diagnoses = 1 I don't have a way to give a command to transform it to 0. It will transform others (that must remain 1 to 0) - it will overwrite my correct data in Z. Every ID is different:  different ages and different amount of entries and diagnoses. 

 


The purpler portion is what you expect, but your expectations are incorrect. If you want this, you need to be able to join by more than ID, ie ID + Event or another variable to uniquely identify each event. If you cannot, a merge links each record by the identified variable and repeats.  

 

So show us a better example of what you want and we can maybe offer some suggestions to get there, but a MERGE/JOIN as shown will not get you there because your logic is more complicated. 

Mscarboncopy
Pyrite | Level 9

yes. 

Thank you. I changed the merge by id and age and it worked. 

it is not overwriting.

Now I have another issue:

I need to be able to merge the data when the age is the same.

I am not sure if I can explain but I will try.

When I merge by 2 variables if prevents the overwrite and this was possible because I was talking about it with you here 🙂 

But if I have data for the same age for example say I have measurements (from the one file) and diagnoses (from the other)  with same age 10. 

They are in 2 different lines when they should be in one line only.

so   ID        A B C     D E F    AGE

        1        10 5 5    .   .   .     14

         1        .    .  .    1  1  1  14

 

How do I tell sas to make this entry only one entry for those cases only?

This happens in several instances in the data.

Thank you so much  

Kurt_Bremser
Super User

PROVIDE.EXAMPLE.DATA.IN.A.DATA.STEP

and your code.

@ChrisNZ has given you a very beautiful example how to do it, and the macro linked to by @Reeza even automates it. Help us to help you.

Mscarboncopy
Pyrite | Level 9

Data want;

Merge Y Z;

by id visit_type;

Run;

 

This gives me a data set with 13.000 observations. Multiple observations for each ID.

Different variables, different ages. Only a couple of ages match from one file to the other.

I need to make sure the ones that do match in age are only in one line instead of 2.

Variables in Y are not the same as in Z aside from id visit_type and age. 

but  age var has two different names (depending on the file) so it would not overwrite.

one is age_yr the other is visit_age. 

I am trying to do a comparison to see which ones match but then what ?

So this way I can merge them but prevent the overwriting of the other variables... so what is not in Y  has to be missing and what is not in Z has to be missing. 

I don't have a data step for that, it is what I am trying to figure out.

I hope that helps.

Thanks

Mari

 

 

Kurt_Bremser
Super User

@Mscarboncopy wrote:

Data want;

Merge Y Z;

by id visit_type;

Run;

 

This gives me a data set with 13.000 observations. Multiple observations for each ID.

Different variables, different ages. Only a couple of ages match from one file to the other.

I need to make sure the ones that do match in age are only in one line instead of 2.

Variables in Y are not the same as in Z aside from id visit_type and age. 

but  age var has two different names (depending on the file) so it would not overwrite.

one is age_yr the other is visit_age. 

I am trying to do a comparison to see which ones match but then what ?

So this way I can merge them but prevent the overwriting of the other variables... so what is not in Y  has to be missing and what is not in Z has to be missing. 

I don't have a data step for that, it is what I am trying to figure out.

I hope that helps.

Thanks

Mari

 

 


I do not see any usable example data, against which we could test code.

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 17 replies
  • 10124 views
  • 3 likes
  • 6 in conversation