<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Missing values after merging in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Missing-values-after-merging/m-p/923585#M363600</link>
    <description>&lt;P&gt;Hi,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I have a large dataset with &amp;gt;1,000,000 observations for ~10,000 patients.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Each patient has a unique ID.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data have;
input ID ICD10 $;
datalines; 
11111 E0800
11111 E0801
11111 E0803
22222 J45909
22222 J45908
33333 G4001
33333 G4002
33333 G4003
44444 E8883
44444 E8882
;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I need to add an age variable to the main dataset with &amp;gt;1,000,000 observations.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I have a separate age dataset which has the age variable and corresponding patient ID.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data age;
input ID age;
datalines;
1 57
2 64
3 67
4 81
;
Run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Before merging the two datasets, I have set the length and format for the ID variable on which I am merging,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data have;
length ID $ 5;
set have;
format ID $8.;
informat ID $10.; 
run;

data age;
length ID $ 5;
set age;
format ID $8.;
informat ID $10.;
run;

/*merge have and age datasets*/
Proc sort data=have; by ID; run;
Proc sort data=age; by ID; run;

data merged;
merge have (in=a) age(in=b);
by ID;
if a and b;
run;
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;However, I am encountering issues where age does not appear in the merged dataset for all observations (even though the ID does have an age value available in the age dataset), for example:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;Data merged;
Input ID ICD10 $ age;
Datalines; 
11111 E0800 .
11111 E0801 57
11111 E0803 57
22222 J45909 64
22222 J45908 .
33333 G4001 67
33333 G4002 .
33333 G4003 .
44444 E8883 .
44444 E8882 81
;
run;
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;When I first discovered the issue, I also set the ID informat the same in both datasets (above code), in an attempt to see if this would resolve the issue (but it has not).&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Any help would be greatly appreciated.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Thank you in advance,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Tue, 09 Apr 2024 12:14:49 GMT</pubDate>
    <dc:creator>Epi_Stats</dc:creator>
    <dc:date>2024-04-09T12:14:49Z</dc:date>
    <item>
      <title>Missing values after merging</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Missing-values-after-merging/m-p/923585#M363600</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I have a large dataset with &amp;gt;1,000,000 observations for ~10,000 patients.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Each patient has a unique ID.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data have;
input ID ICD10 $;
datalines; 
11111 E0800
11111 E0801
11111 E0803
22222 J45909
22222 J45908
33333 G4001
33333 G4002
33333 G4003
44444 E8883
44444 E8882
;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I need to add an age variable to the main dataset with &amp;gt;1,000,000 observations.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I have a separate age dataset which has the age variable and corresponding patient ID.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data age;
input ID age;
datalines;
1 57
2 64
3 67
4 81
;
Run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Before merging the two datasets, I have set the length and format for the ID variable on which I am merging,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data have;
length ID $ 5;
set have;
format ID $8.;
informat ID $10.; 
run;

data age;
length ID $ 5;
set age;
format ID $8.;
informat ID $10.;
run;

/*merge have and age datasets*/
Proc sort data=have; by ID; run;
Proc sort data=age; by ID; run;

data merged;
merge have (in=a) age(in=b);
by ID;
if a and b;
run;
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;However, I am encountering issues where age does not appear in the merged dataset for all observations (even though the ID does have an age value available in the age dataset), for example:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;Data merged;
Input ID ICD10 $ age;
Datalines; 
11111 E0800 .
11111 E0801 57
11111 E0803 57
22222 J45909 64
22222 J45908 .
33333 G4001 67
33333 G4002 .
33333 G4003 .
44444 E8883 .
44444 E8882 81
;
run;
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;When I first discovered the issue, I also set the ID informat the same in both datasets (above code), in an attempt to see if this would resolve the issue (but it has not).&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Any help would be greatly appreciated.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Thank you in advance,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 09 Apr 2024 12:14:49 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Missing-values-after-merging/m-p/923585#M363600</guid>
      <dc:creator>Epi_Stats</dc:creator>
      <dc:date>2024-04-09T12:14:49Z</dc:date>
    </item>
    <item>
      <title>Re: Missing values after merging</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Missing-values-after-merging/m-p/923593#M363601</link>
      <description>&lt;P&gt;I do not get the output you get. I get output with no missing values.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data have;
input ID ICD10 $;
datalines; 
11111 E0800
11111 E0801
11111 E0803
22222 J45909
22222 J45908
33333 G4001
33333 G4002
33333 G4003
44444 E8883
44444 E8882
;

data age;
input ID age;
datalines;
11111 57
22222 64
33333 67
44444 81
;
data have;
length ID $ 5;
set have;
format ID $8.;
informat ID $10.; 
run;

data age;
length ID $ 5;
set age;
format ID $8.;
informat ID $10.;
run;

/*merge have and age datasets*/
Proc sort data=have; by ID; run;
Proc sort data=age; by ID; run;
data merged;
merge have (in=a) age(in=b);
by ID;
if a and b;
run;

proc print data=merged;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I'm not sure why you would think an informat would make a difference. Informats only matter when you are reading in data. They have no impact if the data comes from a SET statement. In fact, if you remove the second DATA HAVE; step and second DATA AGE; step from the program, it still works fine.&lt;/P&gt;</description>
      <pubDate>Tue, 09 Apr 2024 12:44:16 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Missing-values-after-merging/m-p/923593#M363601</guid>
      <dc:creator>PaigeMiller</dc:creator>
      <dc:date>2024-04-09T12:44:16Z</dc:date>
    </item>
    <item>
      <title>Re: Missing values after merging</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Missing-values-after-merging/m-p/923595#M363603</link>
      <description>&lt;P&gt;Thanks Paige for replying.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Yes, I know, I can't share my dataset here, and the issue I was having was not replicated in the example data I provided...&amp;nbsp; hence why I shared an example merged dataset of what I was getting.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;However, I think I have now discovered the problem, in the have dataset, there was a variable called "age". When I dropped this, the merge worked seamlessly, as per normal!...&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 09 Apr 2024 12:46:06 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Missing-values-after-merging/m-p/923595#M363603</guid>
      <dc:creator>Epi_Stats</dc:creator>
      <dc:date>2024-04-09T12:46:06Z</dc:date>
    </item>
    <item>
      <title>Re: Missing values after merging</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Missing-values-after-merging/m-p/923599#M363605</link>
      <description>&lt;P&gt;1) could you run something like this example to test if that ID are really the same?&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data test;
  ID='11111';
  output;
  iD='1l111';
  output;
run;

data test2;
  set test;
  the_same=put(ID, $hex64.);
  put _all_;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;2) is your code "just merge" or you have there some other data transformations too ?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Bart&lt;/P&gt;</description>
      <pubDate>Tue, 09 Apr 2024 12:55:33 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Missing-values-after-merging/m-p/923599#M363605</guid>
      <dc:creator>yabwon</dc:creator>
      <dc:date>2024-04-09T12:55:33Z</dc:date>
    </item>
  </channel>
</rss>

