Solved: creating flag if id exists in different dataset

sasprogramming · Posted 05-26-2021 03:37 AM

If the ID from data set 1 exists in dataset 2, I would like to create a new variable called flag in dataset 2 and it =1 and =0 otherwise. This is shown below. I tried using merge, but couldn't find a good way to do this. any help would be great.

Data 1:

ID	X1	X2
54	9	3
55	3	2
56	2	0
57	4	4
58	4	1
59	6	8
60	1	7

Data 2:

ID	Y1	Y2	Y3
54	55	79	55
55	83	61	55
56	74	77	60
57	64	88	56
60	87	78	83
61	84	87	77
62	78	68	50
63	89	59	52
64	74	77	60
65	55	79	55

Desired data:

ID	Y1	Y2	Y3	FLAG
54	55	79	55	1
55	83	61	55	1
56	74	77	60	1
57	64	88	56	1
60	87	78	83	1
61	84	87	77	0
62	78	68	50	0
63	89	59	52	0
64	74	77	60	0
65	55	79	55	0

s_lassen · Posted 05-26-2021 04:36 AM

The easy way is to use SQL:

proc sql;
  create table want as 
  select *,id in(select id from data1) as flag
  from data2;
quit;

View solution in original post

s_lassen · Posted 05-26-2021 04:36 AM

The easy way is to use SQL:

proc sql;
  create table want as 
  select *,id in(select id from data1) as flag
  from data2;
quit;

Kurt_Bremser · Posted 05-26-2021 04:48 AM

And if performance is of importance, use a data step hash:

data want;
set data2;
if _n_ = 1
then do;
  declare hash d1 (dataset:"data1");
  d1.definekey("id");
  d1.definedone();
end;
flag = (d1.check() = 0);
run;

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

SteveDenham · Posted 05-26-2021 08:30 AM

One more way, but definitely slower:

data want;
merge one(in=a) two(in=b);
by id;
if a and b then flag=1;
    else flag=0;
if y1=. or y2=. or y3=. then delete;
keep id y1 y2 y3 flag;
run;

How you handle the missings after the merge is dependent on your ultimate objective. Here I deleted any record missing at least one of y1-y3. I could understand making this different if the criteria for keeping a record was something other than at least one of the variables in dataset 2 is missing.

SteveDenham

Ksharp · Posted 05-26-2021 08:36 AM

proc sql;
create table want as
select *,exists(select * from data1 where id=a.id) as flag
from data2 as a;
quit;

creating flag if id exists in different dataset

Re: creating flag if id exists in different dataset

Re: creating flag if id exists in different dataset

Re: creating flag if id exists in different dataset

Re: creating flag if id exists in different dataset

Re: creating flag if id exists in different dataset

ID	Y1	Y2	Y3
54	55	79	55
55	83	61	55
56	74	77	60
57	64	88	56
60	87	78	83
61	84	87	77
62	78	68	50
63	89	59	52
64	74	77	60
65	55	79	55

ID	Y1	Y2	Y3
54	55	79	55
55	83	61	55
56	74	77	60
57	64	88	56
60	87	78	83
61	84	87	77
62	78	68	50
63	89	59	52
64	74	77	60
65	55	79	55

creating flag if id exists in different dataset

Re: creating flag if id exists in different dataset

Re: creating flag if id exists in different dataset

Re: creating flag if id exists in different dataset

Re: creating flag if id exists in different dataset

Re: creating flag if id exists in different dataset

Registration is open

SAS Training: Just a Click Away

ID	Y1	Y2	Y3
54	55	79	55
55	83	61	55
56	74	77	60
57	64	88	56
60	87	78	83
61	84	87	77
62	78	68	50
63	89	59	52
64	74	77	60
65	55	79	55