i all,
I use the following code to combine seven datasets. The variables of company and investment_date are same in each dataset. Besides, each dataset has its unique variables.
data y2002;
merge y2002v1 y2002v2 y2002v3 y2002v4 y2002v5 y2002v6 y2002v7;
by company Investment_Date;
run;
In my understanding, the above code should be enough to merge dataset. However, SAS still give me a note of repeats of BY values
NOTE: MERGE statement has more than one data set with repeats of BY values. NOTE: There were 10974 observations read from the data set WORK.Y2002V1. NOTE: There were 10974 observations read from the data set WORK.Y2002V2. NOTE: There were 10974 observations read from the data set WORK.Y2002V3. NOTE: There were 10974 observations read from the data set WORK.Y2002V4. NOTE: There were 10974 observations read from the data set WORK.Y2002V5. NOTE: There were 10974 observations read from the data set WORK.Y2002V6. NOTE: There were 10974 observations read from the data set WORK.Y2002V7. NOTE: The data set WORK.Y2002 has 10974 observations and 98 variables.
To help you understand the dataset, the followings are the examples of datasets.
data dataset1;
input company $ investment_date variable1 variable2;
cards;
aaa 20000101 1 2
aaa 20000102 21 22
aaa 20000103 12 33
bbb 20000101 2 32
bbb 20000104 23 44
bbb 20000108 33 22
ccc 20000404 44 12
;
data dataset2;
input company $ investment_date variable3 variable4;
cards;
aaa 20000101 11 22
aaa 20000102 221 232
aaa 20000103 122 453
bbb 20000101 23 342
bbb 20000104 423 424
bbb 20000108 333 42
ccc 20000404 544 152
;
Please advice an appropriate merge statement and/or SQL join. Thanks
Then you have duplicate company investment_date combinations in two or more of your data sets and Merge likely did not do what you want.
Try code similar to this for each of your data sets:
Proc freq data= y2002v1 order=freq; tables company*investment_date / missing list; run;
The top of each output table will have the most frequent responses, i.e. the duplicates.
Then clean the data needed so that there are only one record per company investment_date per set.
SQL will create multiple records for each of the matches: 2 records in 2 data sets with the same company investment_date = 4 output records total. If you have multiple data sets with multiple matches you could end up with a lot of very questionable data.
Neither of the example datasets you posted have multiple observations for any of the BY groups.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.
Find more tutorials on the SAS Users YouTube channel.