Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Home
- /
- Programming
- /
- SAS Procedures
- /
- Re: Merge and keep same observations

Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

🔒 This topic is **solved** and **locked**.
Need further help from the community? Please
sign in and ask a **new** question.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Posted 06-02-2017 08:52 AM
(10741 views)

I have merged two datasets. first one has 55859 and other has 57658 observations. If we combine both values then it becomes 113517, But merged dataset tells that there are 89765 observations. So It means there are some same observations on the basis of fyear and cusip.

My question is, Is this the difference of 23752 observations from (113517-89765) is the same observation in two datasets? How can I get the observations which are same in the two data set by delete those which do not match with each

How can I get the observations which are same in the two data set by deleting those which do not match with each other?

18 data mydata.E1; 19 merge mydata.sdc mydata.compusip; 20 by fyear cusip; 21 run; NOTE: MERGE statement has more than one data set with repeats of BY values. NOTE: There were 55859 observations read from the data set MYDATA.SDC. NOTE: There were 57658 observations read from the data set MYDATA.COMPUSIP. NOTE: The data set MYDATA.E1 has 89765 observations and 120 variables. NOTE: DATA statement used (Total process time): real time 0.14 seconds cpu time 0.07 seconds

1 ACCEPTED SOLUTION

Accepted Solutions

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

@Jahanzaib wrote:

@ballardw: your codes turns out with the same No of observations.

4 merge mydata.sdc (in=In1)

5 mydata.compusip (in=In2);

6 ;

7 by fyear cusip;

8 InSDC=In1;

9 InCompusip=In2;

10 run;

NOTE: MERGE statement has more than one data set with repeats of BY values.

NOTE: There were 55859 observations read from the data set MYDATA.SDC.

NOTE: There were 57658 observations read from the data set MYDATA.COMPUSIP.

NOTE: The data set MYDATA.MERGED2 has 89765 observations and 122 variables.

If you reread my post you will see that the code adds two variables to let you know which data set contributed to each record. You can do as you will with that information. Such as send those that only appear in set 1 to one output data set, only in set 2 to a different and both to yet a third set. Or select some desired combination. The purpose was to show how to get information about contributing datasets which is extensible to more sets.

8 REPLIES 8

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

You sounds surprised that you are actually getting matches, when doing a merge. Isn't that what you should expect?

If no matches were expected, you perhaps was looking for appending data.

The exact math of the observation numbers can be uncertain, you can have unpredictable results if one, or both, data sets have duplicates.

If you only want matching rows, in the data step use (in=a/b) ds option together with a subsetting if a and b;

In SQL, do a inner join.

Data never sleeps

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

No not surprised, I expect so but I want to keep those which are matched one, not the others.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

You can use the IN dataset option to tell you which data set(s) contribute to the current the record. Since the IN variarbles are temporary you need to assign them to keep the values.

This code adds two variables that will have a value of 1 if the data set contributed. The ones where both InSDC and InCompusip = 1 are matches (and values for other common variables come from compusip).

data mydata.E1; merge mydata.sdc (in=In1) mydata.compusip (in=In2); ; by fyear cusip; InSDC=In1; InCompusip=In2; run;

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

@ballardw: your codes turns out with the same No of observations.

4 merge mydata.sdc (in=In1)

5 mydata.compusip (in=In2);

6 ;

7 by fyear cusip;

8 InSDC=In1;

9 InCompusip=In2;

10 run;

NOTE: MERGE statement has more than one data set with repeats of BY values.

NOTE: There were 55859 observations read from the data set MYDATA.SDC.

NOTE: There were 57658 observations read from the data set MYDATA.COMPUSIP.

NOTE: The data set MYDATA.MERGED2 has 89765 observations and 122 variables.

4 merge mydata.sdc (in=In1)

5 mydata.compusip (in=In2);

6 ;

7 by fyear cusip;

8 InSDC=In1;

9 InCompusip=In2;

10 run;

NOTE: MERGE statement has more than one data set with repeats of BY values.

NOTE: There were 55859 observations read from the data set MYDATA.SDC.

NOTE: There were 57658 observations read from the data set MYDATA.COMPUSIP.

NOTE: The data set MYDATA.MERGED2 has 89765 observations and 122 variables.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Hi:

I suggest you revisit the lesson in Programming 1 on how merges work. As an example, here is a simple merge with much smaller data sets. You can work out with pencil and paper which rows are matches and which are non-matches. Using the IN= option allows you to control the output of matches and/or non-matches, as shown below.

cynthia

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Thank you that was clear.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

@Jahanzaib wrote:

@ballardw: your codes turns out with the same No of observations.

4 merge mydata.sdc (in=In1)

5 mydata.compusip (in=In2);

6 ;

7 by fyear cusip;

8 InSDC=In1;

9 InCompusip=In2;

10 run;

NOTE: MERGE statement has more than one data set with repeats of BY values.

NOTE: There were 55859 observations read from the data set MYDATA.SDC.

NOTE: There were 57658 observations read from the data set MYDATA.COMPUSIP.

NOTE: The data set MYDATA.MERGED2 has 89765 observations and 122 variables.

If you reread my post you will see that the code adds two variables to let you know which data set contributed to each record. You can do as you will with that information. Such as send those that only appear in set 1 to one output data set, only in set 2 to a different and both to yet a third set. Or select some desired combination. The purpose was to show how to get information about contributing datasets which is extensible to more sets.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

You are merging two datasets that BOTH have multiple observations per BY group. Are you sure you want to do that? Is there not another variable you can add to your BY statement to make the merge either 1 to 1 or at least 1 to N?

What SAS will do when merging N to N is match the observations in the BY group in the order that they appear. So the first observations from table A is matched to the first observation from table B, etc. If one of the two datasets contributes fewer observations than the other then the values for its last observations are retained for the rest of BY group. This includes the setting of the variable specified in the IN= dataset option.

If for some strange reason you do want to continue with merging the data in this way and mearly want to eliminate the extra records for each BY group so that the output will only contain observations with data from both inputs then you will need to reset the IN= variables so that they will reflect whether a new observation has been read from that source. So if in a BY group there are 5 observations from A and only 3 from B you could get SAS to only output the first 3 observations for that BY group.

```
data want ;
set a(in=in1) b(in=in2);
by id ;
if in1 and in2 then output;
call missing(in1,in2);
run;
```

**Don't miss out on SAS Innovate - Register now for the FREE Livestream!**

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.