I have two datasets, call them dataset1 and dataset2. The first is a list of identifiers with character variable name "symbol", and there are duplicates in it. I use another variable "id" to order these symbols. The second is a panel data set with symbol and "year". For example,
data dataset1; input id symbol $; format id 1. symbol $2.; datalines; 1 A 2 B 3 B 4 C 5 D 6 D 7 D 8 E ; run;
and
data dataset2; length symbol $2 date 8; informat symbol $2. date yymmdd.; format symbol $2. date yymmddn8.; input symbol $ date; datalines; A 20120101 A 20120102 A 20120103 B 20120101 B 20120102 C 20120102 C 20120103 C 20120104 D 20120101 D 20120102 D 20120103 D 20120104 E 20120102 E 20120103 ; run;
My purpose is to merge the two datasets by "symbol", and each "id" in dataset1 should be merged with all corresponding symbol-date observations in dataset2. The merged data set should contain 24 observations. However, if I use
data want; merge dataset1 (in = a) dataset2 (in = b); by symbol; if a and b; run;
I got only 14 observations. The reason is that, duplicate symbols in dataset1 are not treated as different IDs. How can I achieve the result I want?
SQL does this fairly easily:
proc sql;
create table want as
select dataset1.*, dataset2.date
from dataset1, dataset2
where dataset1.symbol = dataset2.symbol;
quit;
SQL does this fairly easily:
proc sql;
create table want as
select dataset1.*, dataset2.date
from dataset1, dataset2
where dataset1.symbol = dataset2.symbol;
quit;
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.