Help using Base SAS procedures

Join Datasets Without Key

Reply
Contributor
Posts: 64

Join Datasets Without Key

I have two datasets, one with 3000 records and other with 30000 records. I need to join the datasets so that the variables in the datasets are side by side. However, I do not have a primary key to join on.

The result should have the number of observations of the larger dataset.

Thanks,

Super User
Posts: 17,868

Re: Join Datasets Without Key

So how are you repeating the data? Or past 3000 the observations are missing?

Some sample data would be great.

You can merge side by side directly with a MERGE and no BY statement. Make sure you have no variables with the same names, or that you understand how they'll be handled.

data test;

merge sashelp.cars(obs=30 keep=make model mpg_city) sashelp.class;

run;

Regular Contributor
Posts: 217

Re: Join Datasets Without Key

sasmaverick,

An example of your data would help.  Potentially you could have 30,000 rows with all the columns from the 3000 row table.  In the potential table you could have 27,000 rows with missing values in the columns from the 3000 row table.  If you create a mockup of what you have and what you want to have, it will be easier to help you.

Super User
Posts: 5,085

Re: Join Datasets Without Key

How about this as an approach:

data want;

   set large_file;

   random_obs = ceil(_nobs_ * ranuni(12345));

   set small_file point=random_obs nobs=_nobs_;

   drop random_obs;

run;

For each record in the larger file, it randomly selects one of the records in the smaller file and outputs.  Naturally, each record in the smaller file can appear multiple times in the final result.  And since the selection is random, it is possible that one of the records in the smaller file might not be randomly selected and thus might not appear in the final result.

Contributor
Posts: 64

Re: Join Datasets Without Key

Thanks all for your replies.

@jwillis Following is the sample of datasets I have:

Larger dataset:

Name      Site                     City                Amount

Arthur     Madrid-JMC          Madrid          $40000

Simon     San Diego-Main     San Diego     $20000

......30,000 rows

Smaller Dataset:

Person_Name     Site_Name     City          Amount

Arthur               Madrid-JMC     Madrid        $40000

James               Paris-ANC     Paris          $15000

.....3000 rows

Expected Output: (Need to create a new variable called Match_Type--based on matching Names, but don't want to join based on Name)

Name      Person_Name    Site                     Site_Name      City                  Amount           Match_Type

Arthur        Arthur          Madrid-JMC          Madrid-JMC     Madrid                  $40000          Match

Simon                           San Diego-Main                           San Diego            $20000          No Match

                James                                       Paris-ANC       Paris                   $15000         No Match

Note: Simon and James are only present in their respective datasets.

Super User
Posts: 5,085

Re: Join Datasets Without Key

So what do you want to do on the 3,001 observation, when your smaller data set has run out of observations?

Ask a Question
Discussion stats
  • 5 replies
  • 427 views
  • 0 likes
  • 4 in conversation