Filtering Large Datasets

Reply
New Contributor
Posts: 3

Filtering Large Datasets

Hi,

I'm after some help with filtering a large dataset....

For example

Table A (variable 1, variable 2)

Approx 400 records, both variables numeric)

Table B (variable 1, variable 2, variable 3........8)

Approx 35m records

essentially i  want the records from table B, based on on table a.... but takes hours to filter...

ive read with confusion about hash merging etc and seems to take just as long... so guessing im not doing something right,,,,

Regular Contributor
Posts: 213

Re: Filtering Large Datasets

This should help with introducing the Hash Object

http://www.lexjansen.com/mwsug/2007/saspres/MWSUG-2007-SAS06.pdf

New Contributor
Posts: 3

Re: Filtering Large Datasets

Awesome thanks for your reply.

as far as i can see, this is based on one variable key matching another.

i essentially need two left/inner joins between both variable a.s and variable b.s..

both datasets are sorted and indexed if this helps.

thankyou!,,

Super User
Posts: 17,840

Re: Filtering Large Datasets

What about formats, if table 1 is only 2 variables.  You don't explain your join condition so I'm not sure.

See example 8 here:

http://www2.sas.com/proceedings/sugi30/001-30.pdf

New Contributor
Posts: 3

Re: Filtering Large Datasets

Read and sounds promising

essentially table a is generated by a different process which gives a list of id's (3 digits) and secondary ids(6 digits)

e.g 123........564234

128.......345698 etc

the second is the large dataset with the two ids followed by a further 6 columns of data which i want to obtain. based on the results in table a

ie. 123 ....564234..........abc.......def..........ghi...j.k.l.......m.....n

123.....345698............o.....p......q.......r.........s.....t....u

so essentially a left join from table a, variable a to large table b, variable a and

a left join from table a, variable b to large table b, variable b

Regular Contributor
Posts: 213

Re: Filtering Large Datasets

The Key can be made of multiple variables.

Beside you can use multiple Hash Objects within one data step.

Ahmed

Ask a Question
Discussion stats
  • 5 replies
  • 221 views
  • 6 likes
  • 3 in conversation