BookmarkSubscribeRSS Feed
jeremy4
Quartz | Level 8

I have one dataset (dataset 1) with 10,000 observations and another dataset (dataset 2) with 150,000 observations. What code could be used so that two output datasets are created when comparing account_id between two datasets?

 

1. In one output dataset (account_match) to be created, there is a list of all of the account_id (and also select all of the corresponding variables associated with that account_id in the output dataset) from dataset 1 that are also in dataset 2 (i.e. where there the account_id in dataset 1 is also in dataset 2)?

 

2. The second output dataset to be created, account_no_match, would output all of the account_id in dataset 1 that are notseen in dataset 2 (and also select all of the corresponding variables associated with that account_id in the output dataset) .

 

As a result, number of accounts in account_match + number of accounts in account_no_match = 10,000 observations (i.e. dataset 1)

 

Code:

data account_match account_no_match;

   [insert code here]

run;

4 REPLIES 4
Reeza
Super User
data in1 in2 matched;

merge t1(in=in_t1) t2(in=in_t2);
by account_id;
if in_t1 and not in_t2 then output in1; else if in_t2 and not in_t1 then output in2; else output matched; run;

Use a MERGE and IN data set option to control the output. 

 

IN1 - all records only in T1 dataset

IN2 - all records only in the T2 dataset

MATCHED - records in both T1 and T2 datasets

 

EDIT: fixed per @PGStats comment

https://documentation.sas.com/?docsetId=ledsoptsref&docsetTarget=n1p1o2dsuc465nn198ovwdrj9mvy.htm&do...


@jeremy4 wrote:

I have one dataset (dataset 1) with 10,000 observations and another dataset (dataset 2) with 150,000 observations. What code could be used so that two output datasets are created when comparing account_id between two datasets?

 

1. In one output dataset (account_match) to be created, there is a list of all of the account_id (and also select all of the corresponding variables associated with that account_id in the output dataset) from dataset 1 that are also in dataset 2 (i.e. where there the account_id in dataset 1 is also in dataset 2)?

 

2. The second output dataset to be created, account_no_match, would output all of the account_id in dataset 1 that are notseen in dataset 2 (and also select all of the corresponding variables associated with that account_id in the output dataset) .

 

As a result, number of accounts in account_match + number of accounts in account_no_match = 10,000 observations (i.e. dataset 1)

 

Code:

data account_match account_no_match;

   [insert code here]

run;


 

KachiM
Rhodochrosite | Level 12

@jeremy4 

 

There are several other ways to solve your problem. It is known as "Table Lookup" problem. The simplest way for a beginner is what @Reeza has given. 

 

What is the data type of 'account_id'? Is it a Number or Character. If it is Number, use of Array might be the fastest way for your problem.

Please let us know.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 954 views
  • 0 likes
  • 4 in conversation