BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
jeremy4
Quartz | Level 8

Hi,

 

I have one dataset (dataset 1) with 10,000 observations and another dataset (dataset 2) with 150,000 observations. What code could be used so that two output datasets are created when comparing account_id between two datasets?

 

1. In one output dataset (account_match) to be created, there is a list of all of the account_id (and also select all of the corresponding variables associated with that account_id in the output dataset) from dataset 1 that are also in dataset 2 (i.e. where there the account_id in dataset 1 is also in dataset 2)?

 

2. The second output dataset to be created, account_no_match, would output all of the account_id in dataset 1 that are not seen in dataset 2 (and also select all of the corresponding variables associated with that account_id in the output dataset) .

 

As a result, number of accounts in account_match + number of accounts in account_no_match = 10,000 observations (i.e. dataset 1)

 

Code:

data account_match account_no_match;

   [insert code here]

run;

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
maguiremq
SAS Super FREQ
data account_1;
input account_id $;
datalines;
1
2
3
4
5
6
7
8
;

data account_2;
input account_id $;
datalines;
14
5
1
2
99
98
93
;

proc sort data=account_1;
by account_id;
run;

proc sort data=account_2;
by account_id;
run;

data account_match account_no_match;
merge account_1 (in=a)
	  account_2 (in=b);
by account_id;
if a and b then output account_match;
else if a and not b then output account_no_match;
run;

Not sure if this is what you're trying to get at.

View solution in original post

6 REPLIES 6
maguiremq
SAS Super FREQ
data account_1;
input account_id $;
datalines;
1
2
3
4
5
6
7
8
;

data account_2;
input account_id $;
datalines;
14
5
1
2
99
98
93
;

proc sort data=account_1;
by account_id;
run;

proc sort data=account_2;
by account_id;
run;

data account_match account_no_match;
merge account_1 (in=a)
	  account_2 (in=b);
by account_id;
if a and b then output account_match;
else if a and not b then output account_no_match;
run;

Not sure if this is what you're trying to get at.

jeremy4
Quartz | Level 8

If I only wanted variables from dataset "account_1" to be included in both output datasets (account_match and account_no_match), how could I change your code?

 

data account_match account_no_match;
merge account_1 (in=a)
	  account_2 (in=b);
by account_id;
if a and b then output account_match;
else if a and not b then output account_no_match;
run;

 

maguiremq
SAS Super FREQ

You want the same results for both data sets? That's how I'm reading your question. If so:

 

data account_match account_no_match;
merge account_1 (in=a)
	  account_2 (in=b);
by account_id;
if a and b then output account_match account_no_match;
run;

I don't think this is what you're trying to convey though. You should post a snapshot of your data.

Patrick
Opal | Level 21

@jeremy4 wrote:

If I only wanted variables from dataset "account_1" to be included in both output datasets (account_match and account_no_match), how could I change your code?

 

data account_match account_no_match;
merge account_1 (in=a)
	  account_2 (in=b);
by account_id;
if a and b then output account_match;
else if a and not b then output account_no_match;
run;

 


 

Only read account_id from table account_2.

....
account_2 (in=b keep=account_id);
 

 

Patrick
Opal | Level 21

@jeremy4 

Here two other options how to do this.

data account_1;
  input account_id $;
  some_other_var=_n_;
  datalines;
1
2
3
4
5
6
7
8
;

data account_2;
  input account_id $;
  datalines;
14
5
1
2
99
98
93
;

/* options 1 */
data match no_match;
  if _n_=1 then 
    do;
      dcl hash h1(dataset:'account_2(keep=account_id)');
      h1.defineKey('account_id');
      h1.defineDone();
    end;
  set account_1;
  if h1.check()=0 then output match;
  else output no_match;
run;

/* option 2 */
proc sql;
  create table match_noMatch as
    select 
      t1.*,
      case when missing(t2.account_id) then '0' else '1' end as match_flg
    from 
      account_1 as t1
      left join
      (select distinct account_id from account_2) as t2
      on t1.account_id=t2.account_id
    ;
quit;

      

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 755 views
  • 0 likes
  • 3 in conversation