I am merging two data sets (#1 & 2) by age. I am getting output (Data set #3) witch did not merge for age <20.
Does anyone knows what's going on? Thanks.
SAS Codes used:
data dataset#3;
merge dataset#1
dataset#2;
by age;
run;
data sets #1 & 2 were soerted before merging.
Data Set #1
age var1
<20 755
20-24 115401
25-29 445878
30-34 830072
35-39 1033127
40-44 1317303
45-49 1541838
50-54 2083025
55-59 1343919
60+ 81271
Data Set #2
age var2
<20 5577
20-24 173245
25-29 522938
30-34 875553
35-39 1091014
40-44 1377601
45-49 1566436
50-54 2177721
55-59 1084776
60+ 247865
Data Set #3 (Output)
age var1 var2
<20 . 5577
<20 755 .
20-24 115401 173245
25-29 445878 522938
30-34 830072 875553
35-39 1033127 1091014
40-44 1317303 1377601
45-49 1541838 1566436
50-54 2083025 2177721
55-59 1343919 1084776
60+ 81271 247865
One possibility is that you have an extra space in one of the datasets for the variable age where age is equal to <20. If the following describes your data, the additional datasteps would be needed:
Data dataset1;
input age $ var1;
cards;
<20 755
20-24 115401
25-29 445878
30-34 830072
35-39 1033127
40-44 1317303
45-49 1541838
50-54 2083025
55-59 1343919
60+ 81271
;
Data dataset2;
input age $ var2;
if _n_ eq 1 then age=" <20";
cards;
<20 5577
20-24 173245
25-29 522938
30-34 875553
35-39 1091014
40-44 1377601
45-49 1566436
50-54 2177721
55-59 1084776
60+ 247865
;
data dataset1;
set dataset1;
age=strip(age);
run;
data dataset2;
set dataset2;
age=strip(age);
run;
proc sort data=dataset1;
by age;
run;
proc sort data=dataset2;
by age;
run;
data dataset3;
merge dataset1 dataset2;
by age;
run;
On potential source of this behavior is if the datasets with the counts shown come from PROC MEANS or SUMMARY output with the age as a CLASS variable with a grouping format as implied by your example, the actual value kept in the datasets could be a different numeric value. Suppose the lowest age in one set before PROC MEANS is 13 and the in the other set the lowest is 15 the actual values in the sets would be 13 and 15 and when merged 13 does not equal 15 so would be on a different line. The other categories may not show this behavior if they all had the same minimum values per age group.
If not, you need to let us know how you are building sets 1 and 2.
Data data1;
input age $ var1;
cards;
<20 755
20-24 115401
25-29 445878
30-34 830072
35-39 1033127
40-44 1317303
45-49 1541838
50-54 2083025
55-59 1343919
60+ 81271
;
Data data2;
input age $ var2;
cards;
<20 5577
20-24 173245
25-29 522938
30-34 875553
35-39 1091014
40-44 1377601
45-49 1566436
50-54 2177721
55-59 1084776
60+ 247865
;
run;
proc sql;
create table data3 as
select data1.age,var1,var2
from data1,data2
where strip(data1.age)=strip(data2.age);
quit;
proc print;run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.