BookmarkSubscribeRSS Feed
Neal3321
Fluorite | Level 6

i all,

 

I use the following code to combine seven datasets. The variables of company and investment_date are same in each dataset. Besides, each dataset has its unique variables. 

 

 

data y2002;
merge y2002v1 y2002v2 y2002v3 y2002v4 y2002v5 y2002v6 y2002v7;
by company Investment_Date;
run;

 

In my understanding, the above code should be enough to merge dataset. However, SAS still give me a note of repeats of BY values

 

 

NOTE: MERGE statement has more than one data set with repeats of BY values.
NOTE: There were 10974 observations read from the data set WORK.Y2002V1.
NOTE: There were 10974 observations read from the data set WORK.Y2002V2.
NOTE: There were 10974 observations read from the data set WORK.Y2002V3.
NOTE: There were 10974 observations read from the data set WORK.Y2002V4.
NOTE: There were 10974 observations read from the data set WORK.Y2002V5.
NOTE: There were 10974 observations read from the data set WORK.Y2002V6.
NOTE: There were 10974 observations read from the data set WORK.Y2002V7.
NOTE: The data set WORK.Y2002 has 10974 observations and 98 variables.

 

 

To help you understand the dataset, the followings are the examples of datasets.

data dataset1;

  input company $  investment_date variable1 variable2;

  cards;

aaa           20000101			1			2

aaa           20000102			21			22

aaa           20000103			12			33

bbb           20000101			2			32

bbb           20000104			23			44

bbb           20000108			33			22

ccc           20000404			44			12

;

data dataset2;

  input company $  investment_date variable3 variable4;

  cards;

aaa           20000101			11			22

aaa           20000102			221			232

aaa           20000103			122			453

bbb           20000101			23			342

bbb           20000104			423			424

bbb           20000108			333			42

ccc           20000404			544			152

;

Please advice an appropriate merge statement and/or SQL join. Thanks

 

 

 
2 REPLIES 2
ballardw
Super User

Then you have duplicate company investment_date combinations in two or more of your data sets and Merge likely did not do what you want.

 

Try code similar to this for each of your data sets:

Proc freq data= y2002v1 order=freq;
   tables company*investment_date / missing list;
run;

The top of each output table will have the most frequent responses, i.e. the duplicates.

 

Then clean the data needed so that there are only one record per company investment_date per set.

 

SQL will create multiple records for each of the matches: 2 records in 2 data sets with the same company investment_date = 4 output records total. If you have multiple data sets with multiple matches you could end up with a lot of very questionable data.

Tom
Super User Tom
Super User

Neither of the example datasets you posted have multiple observations for any of the BY groups.

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to connect to databases in SAS Viya

Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 1300 views
  • 0 likes
  • 3 in conversation