BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Epi_Stats
Obsidian | Level 7

Hi,

 

I have a large dataset with >1,000,000 observations for ~10,000 patients.

 

Each patient has a unique ID.

 

 

data have;
input ID ICD10 $;
datalines; 
11111 E0800
11111 E0801
11111 E0803
22222 J45909
22222 J45908
33333 G4001
33333 G4002
33333 G4003
44444 E8883
44444 E8882
;
run;

 

 

I need to add an age variable to the main dataset with >1,000,000 observations.

 

I have a separate age dataset which has the age variable and corresponding patient ID.

 

 

data age;
input ID age;
datalines;
1 57
2 64
3 67
4 81
;
Run;

 

 

Before merging the two datasets, I have set the length and format for the ID variable on which I am merging,

 

 

data have;
length ID $ 5;
set have;
format ID $8.;
informat ID $10.; 
run;

data age;
length ID $ 5;
set age;
format ID $8.;
informat ID $10.;
run;

/*merge have and age datasets*/
Proc sort data=have; by ID; run;
Proc sort data=age; by ID; run;

data merged;
merge have (in=a) age(in=b);
by ID;
if a and b;
run;

 

 

However, I am encountering issues where age does not appear in the merged dataset for all observations (even though the ID does have an age value available in the age dataset), for example:

 

 

Data merged;
Input ID ICD10 $ age;
Datalines; 
11111 E0800 .
11111 E0801 57
11111 E0803 57
22222 J45909 64
22222 J45908 .
33333 G4001 67
33333 G4002 .
33333 G4003 .
44444 E8883 .
44444 E8882 81
;
run;

 

 

 

When I first discovered the issue, I also set the ID informat the same in both datasets (above code), in an attempt to see if this would resolve the issue (but it has not).

 

Any help would be greatly appreciated.

 

Thank you in advance,

 

1 ACCEPTED SOLUTION

Accepted Solutions
Epi_Stats
Obsidian | Level 7

Thanks Paige for replying.

 

Yes, I know, I can't share my dataset here, and the issue I was having was not replicated in the example data I provided...  hence why I shared an example merged dataset of what I was getting.

 

However, I think I have now discovered the problem, in the have dataset, there was a variable called "age". When I dropped this, the merge worked seamlessly, as per normal!... 

View solution in original post

3 REPLIES 3
PaigeMiller
Diamond | Level 26

I do not get the output you get. I get output with no missing values.

 

data have;
input ID ICD10 $;
datalines; 
11111 E0800
11111 E0801
11111 E0803
22222 J45909
22222 J45908
33333 G4001
33333 G4002
33333 G4003
44444 E8883
44444 E8882
;

data age;
input ID age;
datalines;
11111 57
22222 64
33333 67
44444 81
;
data have;
length ID $ 5;
set have;
format ID $8.;
informat ID $10.; 
run;

data age;
length ID $ 5;
set age;
format ID $8.;
informat ID $10.;
run;

/*merge have and age datasets*/
Proc sort data=have; by ID; run;
Proc sort data=age; by ID; run;
data merged;
merge have (in=a) age(in=b);
by ID;
if a and b;
run;

proc print data=merged;
run;

 

 

I'm not sure why you would think an informat would make a difference. Informats only matter when you are reading in data. They have no impact if the data comes from a SET statement. In fact, if you remove the second DATA HAVE; step and second DATA AGE; step from the program, it still works fine.

--
Paige Miller
Epi_Stats
Obsidian | Level 7

Thanks Paige for replying.

 

Yes, I know, I can't share my dataset here, and the issue I was having was not replicated in the example data I provided...  hence why I shared an example merged dataset of what I was getting.

 

However, I think I have now discovered the problem, in the have dataset, there was a variable called "age". When I dropped this, the merge worked seamlessly, as per normal!... 

yabwon
Onyx | Level 15

1) could you run something like this example to test if that ID are really the same?

data test;
  ID='11111';
  output;
  iD='1l111';
  output;
run;

data test2;
  set test;
  the_same=put(ID, $hex64.);
  put _all_;
run;

 

2) is your code "just merge" or you have there some other data transformations too ?

 

Bart

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation



sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 210 views
  • 0 likes
  • 3 in conversation