Join Datasets Without Key

sasmaverick · Posted 07-08-2015 02:57 PM

I have two datasets, one with 3000 records and other with 30000 records. I need to join the datasets so that the variables in the datasets are side by side. However, I do not have a primary key to join on.

The result should have the number of observations of the larger dataset.

Thanks,

Reeza · Posted 07-08-2015 03:05 PM

So how are you repeating the data? Or past 3000 the observations are missing?

Some sample data would be great.

You can merge side by side directly with a MERGE and no BY statement. Make sure you have no variables with the same names, or that you understand how they'll be handled.

data test;

merge sashelp.cars(obs=30 keep=make model mpg_city) sashelp.class;

run;

jwillis · Posted 07-08-2015 03:30 PM

sasmaverick,

An example of your data would help. Potentially you could have 30,000 rows with all the columns from the 3000 row table. In the potential table you could have 27,000 rows with missing values in the columns from the 3000 row table. If you create a mockup of what you have and what you want to have, it will be easier to help you.

Astounding · Posted 07-08-2015 04:26 PM

How about this as an approach:

data want;

set large_file;

random_obs = ceil(_nobs_ * ranuni(12345));

set small_file point=random_obs nobs=_nobs_;

drop random_obs;

run;

For each record in the larger file, it randomly selects one of the records in the smaller file and outputs. Naturally, each record in the smaller file can appear multiple times in the final result. And since the selection is random, it is possible that one of the records in the smaller file might not be randomly selected and thus might not appear in the final result.

sasmaverick · Posted 07-08-2015 05:24 PM

Thanks all for your replies.

@jwillis Following is the sample of datasets I have:

Larger dataset:

Name Site City Amount

Arthur Madrid-JMC Madrid $40000

Simon San Diego-Main San Diego $20000

......30,000 rows

Smaller Dataset:

Person_Name Site_Name City Amount

Arthur Madrid-JMC Madrid $40000

James Paris-ANC Paris $15000

.....3000 rows

Expected Output: (Need to create a new variable called Match_Type--based on matching Names, but don't want to join based on Name)

Name Person_Name Site Site_Name City Amount Match_Type

Arthur Arthur Madrid-JMC Madrid-JMC Madrid $40000 Match

Simon San Diego-Main San Diego $20000 No Match

James Paris-ANC Paris $15000 No Match

Note: Simon and James are only present in their respective datasets.

Astounding · Posted 07-08-2015 11:22 PM

So what do you want to do on the 3,001 observation, when your smaller data set has run out of observations?

rajan_sharma · Posted 12-27-2017 12:22 PM

proc sort data = large;
by name;
run;

proc sort data = small;
by person_name;
run;

data merged;
retain name person_name;
merge large(in= l) small(rename = (person_name=name) in = s);
by name;
length status$ 8;
if s then do;
person_name = name;
if l then status = "match";
else status = "no match";
end;
else do;
person_name = "";
status = "no match";
end;
run;

Join Datasets Without Key

Re: Join Datasets Without Key

Re: Join Datasets Without Key

Re: Join Datasets Without Key

Re: Join Datasets Without Key

Re: Join Datasets Without Key

Re: Join Datasets Without Key

Ready to join fellow brilliant minds for the SAS Hackathon?

Classroom Training Available!