BookmarkSubscribeRSS Feed
Ksharp
Super User

Hello,Patrick.

I have a different opinion with you. As I said before, Using index would be faster than Direct Access Method (i.e. SET statement) only if the number of updated or query observations is less than 20% of total number of a table .

At sometime, you can see SAS will not use index at LOG when declaring OPTIONS MSLEVEL=I .

Why ? that is because using index is not better than SET statement. That means that is not always a good way to use index .

I prefer to Hash Table , My favorite .

Anyway, It is personal favor.

Regards.

Ksharp

Patrick
Opal | Level 21

Considering that the bottleneck is normally I/O I assume that reading an index file is faster than reading the whole 700M records and even more so writing 100k records and updating the index is faster than writing 700M records.

I believe this 20% figure only applies for select (read) and not for update/insert.

Patrick
Opal | Level 21

I believe that even if your SAS session aborts the data set is not necessarily corrupted. Actually it would have to be quite an extreme event like a sudden power failure to corrupt the table.

As much as I understand Modify transactional integrity ("unit of work") is maintained so (in my understanding) worst what may happen if a SAS session aborts is that only have of the updates are applied.

P.S: And if worst comes to worst then there is even a REPAIR statement as part of Proc Datasets which might fix damages. In all the time I've used Modify I had only once problems - and Repair fixed it.

Patrick
Opal | Level 21

Linus' second point that when using MODIFY with KEY= avoids rewriting the whole master table is a very strong argument for this approach.


Considering the relation of observations in the master and the transaction data set (700m to 100k) I would assume that it's even worth to create an index over the key variables (if not already existing).

data m700;
  retain var1 var2 'm700';
  do keyvar=1 to 1000;
    output;
  end;
run;

data k100;
  retain var1 var2 'k100';
  do keyvar=2,5,10,100,999;
    output;
  end;
run;

proc sql;
  create index keyvar on m700 (keyvar);
quit;


data m700;
  set k100(rename=(var1=kvar1 var2=kvar2));
  modify m700 key=keyvar;
  if _iorc_ = 0 then
  do;
    var1=kvar1;
    var2=kvar2;
    replace;
  end;
run;

LinusH
Tourmaline | Level 20

Yes, an index is essential to speed this process up!

Over time, it might be necessary as a maintenance action, to recreate (and sort) the master table, so that the index processing can remain efficient.

Data never sleeps

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 19 replies
  • 1534 views
  • 0 likes
  • 4 in conversation