SAS Data Integration Studio, DataFlux Data Management Studio, SAS/ACCESS, SAS Data Loader for Hadoop and others

After merging, why does one file consist of all observations and increase in size?

Accepted Solution Solved
Reply
Frequent Contributor
Posts: 75
Accepted Solution

After merging, why does one file consist of all observations and increase in size?

Maybe the title is too vague, but here's the actual circumstance. Before merging, I have file number 1 with the size of N1 and file number 2 N2.

For the most part, N2 is a subset of N1. Some observations in N2 may not be in N1 as seen later.

After merging, the size becomes N, N > N1.

When I run a crosstab of index variables, it indicates that all observations in N come from file number 1. How can that be possible when N > N1?


Accepted Solutions
Solution
‎04-21-2015 03:12 AM
Super User
Posts: 7,762

Re: After merging, why does one file consist of all observations and increase in size?

Posted in reply to NonSleeper

If dataset 2 is a subset of dataset 1 in terms of the by variable, but has multiple observations for one or more ID's present in dataset 1, then those ID's will be multiplied.

Example:

dataset A

ID var1

1 x

2 y

3 z

dataset B

ID var2

1 a

1 b

2 c

result:

ID var1 var2

1 x a

1 x b

2 y c

3 z

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers

View solution in original post


All Replies
Super User
Posts: 7,762

Re: After merging, why does one file consist of all observations and increase in size?

Posted in reply to NonSleeper

How did you do the merge? (Code)

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Frequent Contributor
Posts: 75

Re: After merging, why does one file consist of all observations and increase in size?

Posted in reply to KurtBremser

Like this:

data merge;

merge file1 (in=x) file2 (in=y);

by ID;

index1=x;

index2=y;

run;

Then:

proc freq data=merge;

table index1*index2;

run;

Super User
Posts: 7,762

Re: After merging, why does one file consist of all observations and increase in size?

Posted in reply to NonSleeper

You did not set a condition in your merge like

if x and y;

So you get all the records from both tables.

Example

dataset file1

ID

1

2

3

4

dataset file2

ID

3

4

5

6

result

ID

1

2

3

4

5

6

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Frequent Contributor
Posts: 75

Re: After merging, why does one file consist of all observations and increase in size?

Posted in reply to KurtBremser

But we can always filter out the observations after merging.

Yet I think the problem here is that after merging, all observations were indicated to be from file 1; that is, in the crosstab, there were no cells where index1=0. That's OK if all observations in file 2 were subset of file 1, but then we expect that the sample size did not increase after merging, which however did increase (N > N1).

Solution
‎04-21-2015 03:12 AM
Super User
Posts: 7,762

Re: After merging, why does one file consist of all observations and increase in size?

Posted in reply to NonSleeper

If dataset 2 is a subset of dataset 1 in terms of the by variable, but has multiple observations for one or more ID's present in dataset 1, then those ID's will be multiplied.

Example:

dataset A

ID var1

1 x

2 y

3 z

dataset B

ID var2

1 a

1 b

2 c

result:

ID var1 var2

1 x a

1 x b

2 y c

3 z

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Frequent Contributor
Posts: 75

Re: After merging, why does one file consist of all observations and increase in size?

Posted in reply to KurtBremser

Ah ha, that's the right hit. Thanks.

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 6 replies
  • 474 views
  • 0 likes
  • 2 in conversation