Solved: Re: Code to compare how many account_id in one dataset are in another ...

jeremy4 · Posted 09-25-2019 05:37 AM

Hi,

I have one dataset (dataset 1) with 10,000 observations and another dataset (dataset 2) with 150,000 observations. What code could be used so that two output datasets are created when comparing account_id between two datasets?

1. In one output dataset (account_match) to be created, there is a list of all of the account_id (and also select all of the corresponding variables associated with that account_id in the output dataset) from dataset 1 that are also in dataset 2 (i.e. where there the account_id in dataset 1 is also in dataset 2)?

2. The second output dataset to be created, account_no_match, would output all of the account_id in dataset 1 that are not seen in dataset 2 (and also select all of the corresponding variables associated with that account_id in the output dataset) .

As a result, number of accounts in account_match + number of accounts in account_no_match = 10,000 observations (i.e. dataset 1)

Code:

data account_match account_no_match;

[insert code here]

run;

maguiremq · Posted 09-25-2019 09:56 AM

data account_1;
input account_id $;
datalines;
1
2
3
4
5
6
7
8
;

data account_2;
input account_id $;
datalines;
14
5
1
2
99
98
93
;

proc sort data=account_1;
by account_id;
run;

proc sort data=account_2;
by account_id;
run;

data account_match account_no_match;
merge account_1 (in=a)
	  account_2 (in=b);
by account_id;
if a and b then output account_match;
else if a and not b then output account_no_match;
run;

Not sure if this is what you're trying to get at.

View solution in original post

maguiremq · Posted 09-25-2019 09:56 AM

data account_1;
input account_id $;
datalines;
1
2
3
4
5
6
7
8
;

data account_2;
input account_id $;
datalines;
14
5
1
2
99
98
93
;

proc sort data=account_1;
by account_id;
run;

proc sort data=account_2;
by account_id;
run;

data account_match account_no_match;
merge account_1 (in=a)
	  account_2 (in=b);
by account_id;
if a and b then output account_match;
else if a and not b then output account_no_match;
run;

Not sure if this is what you're trying to get at.

jeremy4 · Posted 09-25-2019 10:09 AM

Thanks a lot!

jeremy4 · Posted 09-25-2019 11:11 AM

If I only wanted variables from dataset "account_1" to be included in both output datasets (account_match and account_no_match), how could I change your code?

data account_match account_no_match;
merge account_1 (in=a)
	  account_2 (in=b);
by account_id;
if a and b then output account_match;
else if a and not b then output account_no_match;
run;

maguiremq · Posted 09-25-2019 11:19 AM

You want the same results for both data sets? That's how I'm reading your question. If so:

data account_match account_no_match;
merge account_1 (in=a)
	  account_2 (in=b);
by account_id;
if a and b then output account_match account_no_match;
run;

I don't think this is what you're trying to convey though. You should post a snapshot of your data.

Patrick · Posted 09-25-2019 04:41 PM

@jeremy4 wrote:

If I only wanted variables from dataset "account_1" to be included in both output datasets (account_match and account_no_match), how could I change your code?
data account_match account_no_match;
merge account_1 (in=a)
	  account_2 (in=b);
by account_id;
if a and b then output account_match;
else if a and not b then output account_no_match;
run;

Only read account_id from table account_2.

....
account_2 (in=b keep=account_id);

Patrick · Posted 09-25-2019 10:19 AM

@jeremy4

Here two other options how to do this.

data account_1;
  input account_id $;
  some_other_var=_n_;
  datalines;
1
2
3
4
5
6
7
8
;

data account_2;
  input account_id $;
  datalines;
14
5
1
2
99
98
93
;

/* options 1 */
data match no_match;
  if _n_=1 then 
    do;
      dcl hash h1(dataset:'account_2(keep=account_id)');
      h1.defineKey('account_id');
      h1.defineDone();
    end;
  set account_1;
  if h1.check()=0 then output match;
  else output no_match;
run;

/* option 2 */
proc sql;
  create table match_noMatch as
    select 
      t1.*,
      case when missing(t2.account_id) then '0' else '1' end as match_flg
    from 
      account_1 as t1
      left join
      (select distinct account_id from account_2) as t2
      on t1.account_id=t2.account_id
    ;
quit;

Code to compare how many account_id in one dataset are in another dataset

Re: Code to compare how many account_id in one dataset are in another dataset

Re: Code to compare how many account_id in one dataset are in another dataset

Re: Code to compare how many account_id in one dataset are in another dataset

Re: Code to compare how many account_id in one dataset are in another dataset

Re: Code to compare how many account_id in one dataset are in another dataset

Re: Code to compare how many account_id in one dataset are in another dataset

Re: Code to compare how many account_id in one dataset are in another dataset

Registration is open

Registration is open

SAS Training: Just a Click Away