I have two datasets, one with 3000 records and other with 30000 records. I need to join the datasets so that the variables in the datasets are side by side. However, I do not have a primary key to join on.
The result should have the number of observations of the larger dataset.
Thanks,
So how are you repeating the data? Or past 3000 the observations are missing?
Some sample data would be great.
You can merge side by side directly with a MERGE and no BY statement. Make sure you have no variables with the same names, or that you understand how they'll be handled.
data test;
merge sashelp.cars(obs=30 keep=make model mpg_city) sashelp.class;
run;
sasmaverick,
An example of your data would help. Potentially you could have 30,000 rows with all the columns from the 3000 row table. In the potential table you could have 27,000 rows with missing values in the columns from the 3000 row table. If you create a mockup of what you have and what you want to have, it will be easier to help you.
How about this as an approach:
data want;
set large_file;
random_obs = ceil(_nobs_ * ranuni(12345));
set small_file point=random_obs nobs=_nobs_;
drop random_obs;
run;
For each record in the larger file, it randomly selects one of the records in the smaller file and outputs. Naturally, each record in the smaller file can appear multiple times in the final result. And since the selection is random, it is possible that one of the records in the smaller file might not be randomly selected and thus might not appear in the final result.
Thanks all for your replies.
@jwillis Following is the sample of datasets I have:
Larger dataset:
Name Site City Amount
Arthur Madrid-JMC Madrid $40000
Simon San Diego-Main San Diego $20000
......30,000 rows
Smaller Dataset:
Person_Name Site_Name City Amount
Arthur Madrid-JMC Madrid $40000
James Paris-ANC Paris $15000
.....3000 rows
Expected Output: (Need to create a new variable called Match_Type--based on matching Names, but don't want to join based on Name)
Name Person_Name Site Site_Name City Amount Match_Type
Arthur Arthur Madrid-JMC Madrid-JMC Madrid $40000 Match
Simon San Diego-Main San Diego $20000 No Match
James Paris-ANC Paris $15000 No Match
Note: Simon and James are only present in their respective datasets.
So what do you want to do on the 3,001 observation, when your smaller data set has run out of observations?
proc sort data = large;
by name;
run;
proc sort data = small;
by person_name;
run;
data merged;
retain name person_name;
merge large(in= l) small(rename = (person_name=name) in = s);
by name;
length status$ 8;
if s then do;
person_name = name;
if l then status = "match";
else status = "no match";
end;
else do;
person_name = "";
status = "no match";
end;
run;
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9.
Early bird rate extended! Save $200 when you sign up by March 31.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.