i all,
I use the following code to combine seven datasets. The variables of company and investment_date are same in each dataset. Besides, each dataset has its unique variables.
data y2002;
merge y2002v1 y2002v2 y2002v3 y2002v4 y2002v5 y2002v6 y2002v7;
by company Investment_Date;
run;
In my understanding, the above code should be enough to merge dataset. However, SAS still give me a note of repeats of BY values
NOTE: MERGE statement has more than one data set with repeats of BY values. NOTE: There were 10974 observations read from the data set WORK.Y2002V1. NOTE: There were 10974 observations read from the data set WORK.Y2002V2. NOTE: There were 10974 observations read from the data set WORK.Y2002V3. NOTE: There were 10974 observations read from the data set WORK.Y2002V4. NOTE: There were 10974 observations read from the data set WORK.Y2002V5. NOTE: There were 10974 observations read from the data set WORK.Y2002V6. NOTE: There were 10974 observations read from the data set WORK.Y2002V7. NOTE: The data set WORK.Y2002 has 10974 observations and 98 variables.
To help you understand the dataset, the followings are the examples of datasets.
data dataset1;
input company $ investment_date variable1 variable2;
cards;
aaa 20000101 1 2
aaa 20000102 21 22
aaa 20000103 12 33
bbb 20000101 2 32
bbb 20000104 23 44
bbb 20000108 33 22
ccc 20000404 44 12
;
data dataset2;
input company $ investment_date variable3 variable4;
cards;
aaa 20000101 11 22
aaa 20000102 221 232
aaa 20000103 122 453
bbb 20000101 23 342
bbb 20000104 423 424
bbb 20000108 333 42
ccc 20000404 544 152
;Please advice an appropriate merge statement and/or SQL join. Thanks
Then you have duplicate company investment_date combinations in two or more of your data sets and Merge likely did not do what you want.
Try code similar to this for each of your data sets:
Proc freq data= y2002v1 order=freq; tables company*investment_date / missing list; run;
The top of each output table will have the most frequent responses, i.e. the duplicates.
Then clean the data needed so that there are only one record per company investment_date per set.
SQL will create multiple records for each of the matches: 2 records in 2 data sets with the same company investment_date = 4 output records total. If you have multiple data sets with multiple matches you could end up with a lot of very questionable data.
Neither of the example datasets you posted have multiple observations for any of the BY groups.
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.
Find more tutorials on the SAS Users YouTube channel.