BookmarkSubscribeRSS Feed
sasmaverick
Obsidian | Level 7

I have two datasets, one with 3000 records and other with 30000 records. I need to join the datasets so that the variables in the datasets are side by side. However, I do not have a primary key to join on.

The result should have the number of observations of the larger dataset.

Thanks,

6 REPLIES 6
Reeza
Super User

So how are you repeating the data? Or past 3000 the observations are missing?

Some sample data would be great.

You can merge side by side directly with a MERGE and no BY statement. Make sure you have no variables with the same names, or that you understand how they'll be handled.

data test;

merge sashelp.cars(obs=30 keep=make model mpg_city) sashelp.class;

run;

jwillis
Quartz | Level 8

sasmaverick,

An example of your data would help.  Potentially you could have 30,000 rows with all the columns from the 3000 row table.  In the potential table you could have 27,000 rows with missing values in the columns from the 3000 row table.  If you create a mockup of what you have and what you want to have, it will be easier to help you.

Astounding
PROC Star

How about this as an approach:

data want;

   set large_file;

   random_obs = ceil(_nobs_ * ranuni(12345));

   set small_file point=random_obs nobs=_nobs_;

   drop random_obs;

run;

For each record in the larger file, it randomly selects one of the records in the smaller file and outputs.  Naturally, each record in the smaller file can appear multiple times in the final result.  And since the selection is random, it is possible that one of the records in the smaller file might not be randomly selected and thus might not appear in the final result.

sasmaverick
Obsidian | Level 7

Thanks all for your replies.

@jwillis Following is the sample of datasets I have:

Larger dataset:

Name      Site                     City                Amount

Arthur     Madrid-JMC          Madrid          $40000

Simon     San Diego-Main     San Diego     $20000

......30,000 rows

Smaller Dataset:

Person_Name     Site_Name     City          Amount

Arthur               Madrid-JMC     Madrid        $40000

James               Paris-ANC     Paris          $15000

.....3000 rows

Expected Output: (Need to create a new variable called Match_Type--based on matching Names, but don't want to join based on Name)

Name      Person_Name    Site                     Site_Name      City                  Amount           Match_Type

Arthur        Arthur          Madrid-JMC          Madrid-JMC     Madrid                  $40000          Match

Simon                           San Diego-Main                           San Diego            $20000          No Match

                James                                       Paris-ANC       Paris                   $15000         No Match

Note: Simon and James are only present in their respective datasets.

Astounding
PROC Star

So what do you want to do on the 3,001 observation, when your smaller data set has run out of observations?

rajan_sharma
Calcite | Level 5

proc sort data = large;
by name;
run;

 

proc sort data = small;
by person_name;
run;

 

data merged;
retain name person_name;
merge large(in= l) small(rename = (person_name=name) in = s);
by name;
length status$ 8;
if s then do;
person_name = name;
if l then status = "match";
else status = "no match";
end;
else do;
person_name = "";
status = "no match";
end;
run;

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 7876 views
  • 0 likes
  • 5 in conversation