BookmarkSubscribeRSS Feed
Neal3321
Fluorite | Level 6

i all,

 

I use the following code to combine seven datasets. The variables of company and investment_date are same in each dataset. Besides, each dataset has its unique variables. 

 

 

data y2002;
merge y2002v1 y2002v2 y2002v3 y2002v4 y2002v5 y2002v6 y2002v7;
by company Investment_Date;
run;

 

In my understanding, the above code should be enough to merge dataset. However, SAS still give me a note of repeats of BY values

 

 

NOTE: MERGE statement has more than one data set with repeats of BY values.
NOTE: There were 10974 observations read from the data set WORK.Y2002V1.
NOTE: There were 10974 observations read from the data set WORK.Y2002V2.
NOTE: There were 10974 observations read from the data set WORK.Y2002V3.
NOTE: There were 10974 observations read from the data set WORK.Y2002V4.
NOTE: There were 10974 observations read from the data set WORK.Y2002V5.
NOTE: There were 10974 observations read from the data set WORK.Y2002V6.
NOTE: There were 10974 observations read from the data set WORK.Y2002V7.
NOTE: The data set WORK.Y2002 has 10974 observations and 98 variables.

 

 

To help you understand the dataset, the followings are the examples of datasets.

data dataset1;

  input company $  investment_date variable1 variable2;

  cards;

aaa           20000101			1			2

aaa           20000102			21			22

aaa           20000103			12			33

bbb           20000101			2			32

bbb           20000104			23			44

bbb           20000108			33			22

ccc           20000404			44			12

;

data dataset2;

  input company $  investment_date variable3 variable4;

  cards;

aaa           20000101			11			22

aaa           20000102			221			232

aaa           20000103			122			453

bbb           20000101			23			342

bbb           20000104			423			424

bbb           20000108			333			42

ccc           20000404			544			152

;

Please advice an appropriate merge statement and/or SQL join. Thanks

 

 

 
2 REPLIES 2
ballardw
Super User

Then you have duplicate company investment_date combinations in two or more of your data sets and Merge likely did not do what you want.

 

Try code similar to this for each of your data sets:

Proc freq data= y2002v1 order=freq;
   tables company*investment_date / missing list;
run;

The top of each output table will have the most frequent responses, i.e. the duplicates.

 

Then clean the data needed so that there are only one record per company investment_date per set.

 

SQL will create multiple records for each of the matches: 2 records in 2 data sets with the same company investment_date = 4 output records total. If you have multiple data sets with multiple matches you could end up with a lot of very questionable data.

Tom
Super User Tom
Super User

Neither of the example datasets you posted have multiple observations for any of the BY groups.

sas-innovate-wordmark-2025-midnight.png

Register Today!

Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.


Register now!

How to connect to databases in SAS Viya

Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 1403 views
  • 0 likes
  • 3 in conversation