DATA Step, Macro, Functions and more

pass observations in data step

Accepted Solution Solved
Reply
Frequent Contributor
Posts: 84
Accepted Solution

pass observations in data step

Hi All,

i want a macro to remove have2 dataset id's from have1 dataset. i need to pass the have2 id's.

data have1;

input id;

cards;

101

102

103

104

105

106

107

108

run;

data have2;

input id;

cards;

101

104

105

;

run;

data want

id

102

103

106

107

108

data want;

set have;

where id not in(how can i pass ids from have2 here);

run;

Thanks in Advance.


Accepted Solutions
Solution
‎05-06-2015 08:01 AM
Super User
Posts: 10,023

Re: pass observations in data step

You really should take a look at Hash Table. If your table is big.

data have1;
input id;
cards;
101
102
103
104
105
106
107
108
;
run;
 
data have2;
input id;
cards;
101
104
105
;
run;
data want;
if _n_ eq 1 then do;
 if 0 then set have2;
 declare hash h(dataset:'have2');
 h.definekey('id');
 h.definedone();
end;
set have1;
if h.check() ne 0;
run;


Xia Keshan

View solution in original post


All Replies
Valued Guide
Posts: 860

Re: pass observations in data step

Here is a solution for you, I added some extra output to show you how the merge works:

data have1;

input id;

cards;

101

102

103

104

105

106

107

108

run;

data have2;

input id;

cards;

101

104

105

;

run;

data want two three;

merge have1(in=a)

      have2(in=b);

by id;

if a and not b then output want;

if b and not a then output two;

if a and b then output three;

run;

/*data want;*/

/*set have;*/

/*where id not in(how can i pass ids from have2 here);*/

/*run;*/

Frequent Contributor
Posts: 84

Re: pass observations in data step

Posted in reply to Steelers_In_DC

Hi Mark,

Thanks for reply. I also thought merge concept but i dropped because of the heavy datasets.

I have around 20Gb dataset. Merging is taking too much time.

Is there any alternative way i can pass the observations.

I have read somewhere about Call Execute. Can anyone suggest about this.

Thanks.

Super User
Posts: 3,252

Re: pass observations in data step

Ths way is quicker - no joining:

proc format;

  value IDList

  101, 104, 105 = 'N'

  other = 'Y'

  ;

run;

data want;

  set have;

  where put(id, IDList.) = 'Y';

run;

Super User
Super User
Posts: 7,942

Re: pass observations in data step

hi,

data _null_;

     set have2 end=last;

     if _n_=1 then call execute('data have; set have;');

     call execute('if id="'||strip(id)||'" then delete;');

     if last then call execute('run;');

run;

This will generate code with if's for each row in have 2.

Super User
Posts: 7,773

Re: pass observations in data step

Depends on the size of have2. If you can fit the Id's in a list to use in a "not in ()" condition, that's the fastest way because you only do one pass through the big dataset.

Then comes RW9's method, which will have more if's to execute and probably take slightly more time.

Third would be the use of a hash object, but if you run out of memory, sorting and merging is the best way to go.

BTW, your have dataset should already be sorted by ID at creation time if that is the main criteria to access it.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Super User
Super User
Posts: 7,942

Re: pass observations in data step

Posted in reply to KurtBremser

Yes, you are right.  I was just being lazy, just modify the generation to generate one in():

data _null_;

     set have2 end=last;

     if _n_=1 then call execute('data have; set have; if id in ("||strip(id)||'"');

     else call execute(',"'||strip(id)||'"');

     if last then call execute(') then delete; run;');

run;

Super Contributor
Posts: 340

Re: pass observations in data step

If you have got time to play around a little. I am wondering if this is an option:

data have1;

input id;

cards;

101

102

103

104

105

106

107

108

;

run;

Data _NULL_;

  Length id 8.;

  Declare Hash H (Dataset:'have1',Multidata:'y');

  H.Definekey('id');

  H.Definedone();

  Call Missing (id);

  Do i=101,104,105; * enter id's of have2 here;

    rc=H.Find(Key:i);

    rc=H.Remove();

  end;

  rc=H.Output(Dataset:'Want');

Run;

.. but RW9's solution seems to be better ..

Trusted Advisor
Posts: 1,228

Re: pass observations in data step

proc sql;

create table want as

select * from have1

where id not in  (select id from have2);

quit;

Solution
‎05-06-2015 08:01 AM
Super User
Posts: 10,023

Re: pass observations in data step

You really should take a look at Hash Table. If your table is big.

data have1;
input id;
cards;
101
102
103
104
105
106
107
108
;
run;
 
data have2;
input id;
cards;
101
104
105
;
run;
data want;
if _n_ eq 1 then do;
 if 0 then set have2;
 declare hash h(dataset:'have2');
 h.definekey('id');
 h.definedone();
end;
set have1;
if h.check() ne 0;
run;


Xia Keshan

Frequent Contributor
Posts: 84

Re: pass observations in data step

Thanks All for the suggestions.

Could you please share some good references on Hash Table.

Super User
Posts: 7,773

Re: pass observations in data step

Look here:

SAS(R) 9.3 Language Reference: Concepts, Second Edition

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 12 replies
  • 383 views
  • 6 likes
  • 8 in conversation