Solved: Missing values after merging

Epi_Stats · Posted 04-09-2024 08:14 AM

Hi,

I have a large dataset with >1,000,000 observations for ~10,000 patients.

Each patient has a unique ID.

data have;
input ID ICD10 $;
datalines; 
11111 E0800
11111 E0801
11111 E0803
22222 J45909
22222 J45908
33333 G4001
33333 G4002
33333 G4003
44444 E8883
44444 E8882
;
run;

I need to add an age variable to the main dataset with >1,000,000 observations.

I have a separate age dataset which has the age variable and corresponding patient ID.

data age;
input ID age;
datalines;
1 57
2 64
3 67
4 81
;
Run;

Before merging the two datasets, I have set the length and format for the ID variable on which I am merging,

data have;
length ID $ 5;
set have;
format ID $8.;
informat ID $10.; 
run;

data age;
length ID $ 5;
set age;
format ID $8.;
informat ID $10.;
run;

/*merge have and age datasets*/
Proc sort data=have; by ID; run;
Proc sort data=age; by ID; run;

data merged;
merge have (in=a) age(in=b);
by ID;
if a and b;
run;

However, I am encountering issues where age does not appear in the merged dataset for all observations (even though the ID does have an age value available in the age dataset), for example:

Data merged;
Input ID ICD10 $ age;
Datalines; 
11111 E0800 .
11111 E0801 57
11111 E0803 57
22222 J45909 64
22222 J45908 .
33333 G4001 67
33333 G4002 .
33333 G4003 .
44444 E8883 .
44444 E8882 81
;
run;

When I first discovered the issue, I also set the ID informat the same in both datasets (above code), in an attempt to see if this would resolve the issue (but it has not).

Any help would be greatly appreciated.

Thank you in advance,

Epi_Stats · Posted 04-09-2024 08:46 AM

Thanks Paige for replying.

Yes, I know, I can't share my dataset here, and the issue I was having was not replicated in the example data I provided... hence why I shared an example merged dataset of what I was getting.

However, I think I have now discovered the problem, in the have dataset, there was a variable called "age". When I dropped this, the merge worked seamlessly, as per normal!...

View solution in original post

PaigeMiller · Posted 04-09-2024 08:41 AM

I do not get the output you get. I get output with no missing values.

data have;
input ID ICD10 $;
datalines; 
11111 E0800
11111 E0801
11111 E0803
22222 J45909
22222 J45908
33333 G4001
33333 G4002
33333 G4003
44444 E8883
44444 E8882
;

data age;
input ID age;
datalines;
11111 57
22222 64
33333 67
44444 81
;
data have;
length ID $ 5;
set have;
format ID $8.;
informat ID $10.; 
run;

data age;
length ID $ 5;
set age;
format ID $8.;
informat ID $10.;
run;

/*merge have and age datasets*/
Proc sort data=have; by ID; run;
Proc sort data=age; by ID; run;
data merged;
merge have (in=a) age(in=b);
by ID;
if a and b;
run;

proc print data=merged;
run;

I'm not sure why you would think an informat would make a difference. Informats only matter when you are reading in data. They have no impact if the data comes from a SET statement. In fact, if you remove the second DATA HAVE; step and second DATA AGE; step from the program, it still works fine.

--
Paige Miller

Epi_Stats · Posted 04-09-2024 08:46 AM

Thanks Paige for replying.

Yes, I know, I can't share my dataset here, and the issue I was having was not replicated in the example data I provided... hence why I shared an example merged dataset of what I was getting.

However, I think I have now discovered the problem, in the have dataset, there was a variable called "age". When I dropped this, the merge worked seamlessly, as per normal!...

yabwon · Posted 04-09-2024 08:55 AM

1) could you run something like this example to test if that ID are really the same?

data test;
  ID='11111';
  output;
  iD='1l111';
  output;
run;

data test2;
  set test;
  the_same=put(ID, $hex64.);
  put _all_;
run;

2) is your code "just merge" or you have there some other data transformations too ?

Bart

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation

Missing values after merging

Re: Missing values after merging

Re: Missing values after merging

Re: Missing values after merging

Re: Missing values after merging

Catch up on SAS Innovate 2026

Missing values after merging

Re: Missing values after merging

Re: Missing values after merging

Re: Missing values after merging

Re: Missing values after merging

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away