BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
kumarK
Quartz | Level 8

Hi All,

i want a macro to remove have2 dataset id's from have1 dataset. i need to pass the have2 id's.

data have1;

input id;

cards;

101

102

103

104

105

106

107

108

run;

data have2;

input id;

cards;

101

104

105

;

run;

data want

id

102

103

106

107

108

data want;

set have;

where id not in(how can i pass ids from have2 here);

run;

Thanks in Advance.

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User

You really should take a look at Hash Table. If your table is big.

data have1;
input id;
cards;
101
102
103
104
105
106
107
108
;
run;
 
data have2;
input id;
cards;
101
104
105
;
run;
data want;
if _n_ eq 1 then do;
 if 0 then set have2;
 declare hash h(dataset:'have2');
 h.definekey('id');
 h.definedone();
end;
set have1;
if h.check() ne 0;
run;


Xia Keshan

View solution in original post

12 REPLIES 12
Steelers_In_DC
Barite | Level 11

Here is a solution for you, I added some extra output to show you how the merge works:

data have1;

input id;

cards;

101

102

103

104

105

106

107

108

run;

data have2;

input id;

cards;

101

104

105

;

run;

data want two three;

merge have1(in=a)

      have2(in=b);

by id;

if a and not b then output want;

if b and not a then output two;

if a and b then output three;

run;

/*data want;*/

/*set have;*/

/*where id not in(how can i pass ids from have2 here);*/

/*run;*/

kumarK
Quartz | Level 8

Hi Mark,

Thanks for reply. I also thought merge concept but i dropped because of the heavy datasets.

I have around 20Gb dataset. Merging is taking too much time.

Is there any alternative way i can pass the observations.

I have read somewhere about Call Execute. Can anyone suggest about this.

Thanks.

SASKiwi
PROC Star

Ths way is quicker - no joining:

proc format;

  value IDList

  101, 104, 105 = 'N'

  other = 'Y'

  ;

run;

data want;

  set have;

  where put(id, IDList.) = 'Y';

run;

RW9
Diamond | Level 26 RW9
Diamond | Level 26

hi,

data _null_;

     set have2 end=last;

     if _n_=1 then call execute('data have; set have;');

     call execute('if id="'||strip(id)||'" then delete;');

     if last then call execute('run;');

run;

This will generate code with if's for each row in have 2.

Kurt_Bremser
Super User

Depends on the size of have2. If you can fit the Id's in a list to use in a "not in ()" condition, that's the fastest way because you only do one pass through the big dataset.

Then comes RW9's method, which will have more if's to execute and probably take slightly more time.

Third would be the use of a hash object, but if you run out of memory, sorting and merging is the best way to go.

BTW, your have dataset should already be sorted by ID at creation time if that is the main criteria to access it.

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Yes, you are right.  I was just being lazy, just modify the generation to generate one in():

data _null_;

     set have2 end=last;

     if _n_=1 then call execute('data have; set have; if id in ("||strip(id)||'"');

     else call execute(',"'||strip(id)||'"');

     if last then call execute(') then delete; run;');

run;

user24feb
Barite | Level 11

If you have got time to play around a little. I am wondering if this is an option:

data have1;

input id;

cards;

101

102

103

104

105

106

107

108

;

run;

Data _NULL_;

  Length id 8.;

  Declare Hash H (Dataset:'have1',Multidata:'y');

  H.Definekey('id');

  H.Definedone();

  Call Missing (id);

  Do i=101,104,105; * enter id's of have2 here;

    rc=H.Find(Key:i);

    rc=H.Remove();

  end;

  rc=H.Output(Dataset:'Want');

Run;

.. but RW9's solution seems to be better ..

stat_sas
Ammonite | Level 13

proc sql;

create table want as

select * from have1

where id not in  (select id from have2);

quit;

Ksharp
Super User

You really should take a look at Hash Table. If your table is big.

data have1;
input id;
cards;
101
102
103
104
105
106
107
108
;
run;
 
data have2;
input id;
cards;
101
104
105
;
run;
data want;
if _n_ eq 1 then do;
 if 0 then set have2;
 declare hash h(dataset:'have2');
 h.definekey('id');
 h.definedone();
end;
set have1;
if h.check() ne 0;
run;


Xia Keshan

kumarK
Quartz | Level 8

Thanks All for the suggestions.

Could you please share some good references on Hash Table.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 12 replies
  • 1532 views
  • 6 likes
  • 8 in conversation