BookmarkSubscribeRSS Feed
tommy81
Obsidian | Level 7

Hi,

I`m looking for an equivalent of the below code in Data step HASH join.


proc sql;

create table rllp2 as

select distinct a.*, b.enam as name,b.position,b.orid,b.ccid_1 as ccid,b.region1 as region

max(a.BEGD,b.BEGD) as BEGD,min(a.ENDD,b.ENDD) as ENDD

from rllp3 as a

left join

rllp1 as b

on a.position_id_k = b.position_id_1

and ((a.BEGD BETWEEN b.BEGD and  b.ENDD)

or (a.ENDD BETWEEN b.BEGD and  b.ENDD) or (a.BEGD >= b.BEGD
and a.ENDD <= b.ENDD) or (a.BEGD <= b.BEGD and a.ENDD >= b.ENDD))

order by position,BEGD,ENDD;

Not sure how to add the distinct ,extra join conditions and order by in a single datas tep.

DATA MI ;

IF 0 THEN SET rllp1;

      DECLARE hash VS(hashexp:10, dataset:"rllp1");

            VS.definekey('position_id_k,'position_id_1');

            VS.definedata(all:'Y');

            VS.definedone();

                  DO UNTIL(eof);

                        SET rllp3end=eof;

                  IF VS.find()=0 THEN OUTPUT;

                        ELSE DO;

                  CALL missing( a1---b1);

                  OUTPUT;

            END;

      END;

STOP ;

RUN;

Kindly suggest some methods

6 REPLIES 6
LinusH
Tourmaline | Level 20

i don't think I can help about hashing, but maybe you could tell us about why? Is it a performance issue regarding the SQL?

It's quite hard for SAS SQL to optimize OR conditions. Storing in SPDE might improve performance (or SPD Server if available), and indexing on at least position_id_1 and  position_id_k.

Then I'm not sure how distinct works with max and min functions. I am more comfortable using Group By, but distinct may work fine for you...

Data never sleeps
tommy81
Obsidian | Level 7


Hi LinusH,

Its about performance and time. The datasets are 100Gb in size.That why i though its better to deploy data step hash.

AhmedAl_Attar
Rhodochrosite | Level 12

tommy81,

When your data set gets into 30GB+ , that's when you move it into SPDE/SPDS!!

After all, that's why SPDE was developed in the first place. it was the "free" solution to handle large data sets using your existing BASE SAS. I know BASE SAS is not Free, but you don't have to purchase and configure SPDS to handle such volumes.

I would strongly recommend investigating SPDE as storage engine for your data set. It will payoff down the line.

Ahmed

tommy81
Obsidian | Level 7

I agree to the SPDS point, however I need to know how much of a difference it could make if I could use  a data step hash to the above step . I`m certain data vloume cannot be controlled much, but at least run times can be improved.

LinusH
Tourmaline | Level 20

I think it will very complicated to do both the join logic and distinct/group by functionality within one single data step.

So my suggestion is to first try to optimize your SQL query first to see what gains is possible, t sum up:

  • simplify the SQL suggested by DBailey
  • Index on the join column(s)
  • move the data to SPDE

Since SQL operations such as summarizing and sorting is done in paralel, investigate if you can exploit the hardware (or have any other HW available). This includes setting up MEMSIZE/SORTSIZE options.

Data never sleeps
DBailey
Lapis Lazuli | Level 10

Can you rewrite your clause to say

left join

rllp1 as b

on a.position_id_k = b.position_id_1

and a.BEGD <= b.ENDD

and a.ENDD >= b.BEGD

I didn't work through all of the permutations but that is what I see normally when trying to make sure you count everything that may have happened during a date span.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 1018 views
  • 3 likes
  • 4 in conversation