Hi All,
i want a macro to remove have2 dataset id's from have1 dataset. i need to pass the have2 id's.
data have1;
input id;
cards;
101
102
103
104
105
106
107
108
run;
data have2;
input id;
cards;
101
104
105
;
run;
data want
id
102
103
106
107
108
data want;
set have;
where id not in(how can i pass ids from have2 here);
run;
Thanks in Advance.
You really should take a look at Hash Table. If your table is big.
data have1; input id; cards; 101 102 103 104 105 106 107 108 ; run; data have2; input id; cards; 101 104 105 ; run; data want; if _n_ eq 1 then do; if 0 then set have2; declare hash h(dataset:'have2'); h.definekey('id'); h.definedone(); end; set have1; if h.check() ne 0; run;
Xia Keshan
Here is a solution for you, I added some extra output to show you how the merge works:
data have1;
input id;
cards;
101
102
103
104
105
106
107
108
run;
data have2;
input id;
cards;
101
104
105
;
run;
data want two three;
merge have1(in=a)
have2(in=b);
by id;
if a and not b then output want;
if b and not a then output two;
if a and b then output three;
run;
/*data want;*/
/*set have;*/
/*where id not in(how can i pass ids from have2 here);*/
/*run;*/
Hi Mark,
Thanks for reply. I also thought merge concept but i dropped because of the heavy datasets.
I have around 20Gb dataset. Merging is taking too much time.
Is there any alternative way i can pass the observations.
I have read somewhere about Call Execute. Can anyone suggest about this.
Thanks.
Ths way is quicker - no joining:
proc format;
value IDList
101, 104, 105 = 'N'
other = 'Y'
;
run;
data want;
set have;
where put(id, IDList.) = 'Y';
run;
hi,
data _null_;
set have2 end=last;
if _n_=1 then call execute('data have; set have;');
call execute('if id="'||strip(id)||'" then delete;');
if last then call execute('run;');
run;
This will generate code with if's for each row in have 2.
Depends on the size of have2. If you can fit the Id's in a list to use in a "not in ()" condition, that's the fastest way because you only do one pass through the big dataset.
Then comes RW9's method, which will have more if's to execute and probably take slightly more time.
Third would be the use of a hash object, but if you run out of memory, sorting and merging is the best way to go.
BTW, your have dataset should already be sorted by ID at creation time if that is the main criteria to access it.
Yes, you are right. I was just being lazy, just modify the generation to generate one in():
data _null_;
set have2 end=last;
if _n_=1 then call execute('data have; set have; if id in ("||strip(id)||'"');
else call execute(',"'||strip(id)||'"');
if last then call execute(') then delete; run;');
run;
If you have got time to play around a little. I am wondering if this is an option:
data have1;
input id;
cards;
101
102
103
104
105
106
107
108
;
run;
Data _NULL_;
Length id 8.;
Declare Hash H (Dataset:'have1',Multidata:'y');
H.Definekey('id');
H.Definedone();
Call Missing (id);
Do i=101,104,105; * enter id's of have2 here;
rc=H.Find(Key:i);
rc=H.Remove();
end;
rc=H.Output(Dataset:'Want');
Run;
.. but RW9's solution seems to be better ..
proc sql;
create table want as
select * from have1
where id not in (select id from have2);
quit;
You really should take a look at Hash Table. If your table is big.
data have1; input id; cards; 101 102 103 104 105 106 107 108 ; run; data have2; input id; cards; 101 104 105 ; run; data want; if _n_ eq 1 then do; if 0 then set have2; declare hash h(dataset:'have2'); h.definekey('id'); h.definedone(); end; set have1; if h.check() ne 0; run;
Xia Keshan
Thanks All for the suggestions.
Could you please share some good references on Hash Table.
I recently asked the same question, here are some resources to review:
http://support.sas.com/resources/papers/proceedings10/151-2010.pdf
http://www2.sas.com/proceedings/forum2007/271-2007.pdf
http://www.nesug.org/proceedings/nesug10/bb/bb01.pdf
Look here:
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.