Maybe the title is too vague, but here's the actual circumstance. Before merging, I have file number 1 with the size of N1 and file number 2 N2.
For the most part, N2 is a subset of N1. Some observations in N2 may not be in N1 as seen later.
After merging, the size becomes N, N > N1.
When I run a crosstab of index variables, it indicates that all observations in N come from file number 1. How can that be possible when N > N1?
If dataset 2 is a subset of dataset 1 in terms of the by variable, but has multiple observations for one or more ID's present in dataset 1, then those ID's will be multiplied.
Example:
dataset A
ID var1
1 x
2 y
3 z
dataset B
ID var2
1 a
1 b
2 c
result:
ID var1 var2
1 x a
1 x b
2 y c
3 z
Like this:
data merge;
merge file1 (in=x) file2 (in=y);
by ID;
index1=x;
index2=y;
run;
Then:
proc freq data=merge;
table index1*index2;
run;
You did not set a condition in your merge like
if x and y;
So you get all the records from both tables.
Example
dataset file1
ID
1
2
3
4
dataset file2
ID
3
4
5
6
result
ID
1
2
3
4
5
6
But we can always filter out the observations after merging.
Yet I think the problem here is that after merging, all observations were indicated to be from file 1; that is, in the crosstab, there were no cells where index1=0. That's OK if all observations in file 2 were subset of file 1, but then we expect that the sample size did not increase after merging, which however did increase (N > N1).
If dataset 2 is a subset of dataset 1 in terms of the by variable, but has multiple observations for one or more ID's present in dataset 1, then those ID's will be multiplied.
Example:
dataset A
ID var1
1 x
2 y
3 z
dataset B
ID var2
1 a
1 b
2 c
result:
ID var1 var2
1 x a
1 x b
2 y c
3 z
Ah ha, that's the right hit. Thanks.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9.
Early bird rate extended! Save $200 when you sign up by March 31.
Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.
Find more tutorials on the SAS Users YouTube channel.