BookmarkSubscribeRSS Feed
Warri
Calcite | Level 5

Dear community,

I'm looking for a proc to add new observations (from a dataset) into a sorted existing one, such that the proc immediately sorts.

Of course it can be done with a proc append followed by a proc sort or in a data step as follows:

data main_dataset;

     set main_dataset

           new_dataset;

      by sort_variable1 sort_variable2 etc;

run;

But this reads through the entire main set and I'm dealing with rather large files. I'd prefer a proc that uses an index to place the new observations sorted into the main set, so that it is faster.

Is there any, and if yes, how does it work?

Regards,

Warri

(Note that the 'sql' in the tags is there because proc sql may solve my problem)

7 REPLIES 7
art297
Opal | Level 21

I think you are looking for proc datasets using the append statement.  Take a look at: http://support.sas.com/documentation/cdl/en/proc/61895/HTML/default/viewer.htm#a000235213.htm

Warri
Calcite | Level 5

I have tried this and it seems that proc append or proc datasets using the append statement does not sort for you. So I interpret the documentation where it says that the append maintains the sort-indicator that this only happends if the sticking the new dataset under the main one makes it sorted.

DLing
Obsidian | Level 7

Just a thought, if the large base dataset is indexed (create via PROC datasets), then proc datasets append statement may be able to append and update the index without physically sorting the very large tables.  The link Art provided describes appending to an indexed dataset.

Howles
Quartz | Level 8

Indexes are the feature intended to fit this requirement. The insertions can also be done with PROC SQL or with a DATA step (using the MODIFY statement).

Whether this approach is advantageous vis a vis appending and sorting will depend on the particulars.

DLing wrote:

Just a thought, if the large base dataset is indexed (create via PROC datasets), then proc datasets append statement may be able to append and update the index without physically sorting the very large tables.  The link Art provided describes appending to an indexed dataset.

LinusH
Tourmaline | Level 20

No, there is no proc or other mechanism that appends/inserts data to a table, and then automatically keep the target tables physically sort order.

The only situation I can think of if you have a base table of transactions sorted by date/datetime, and add new records in chronological order.

This due to the fact that a SAS table is a sequential file. It does not (generally) keep space for later inserts. So, to interleaving records, you do need to reload the table. Interleaving can be done using append/sort as you mentioned, or by using data - set - by, which probably should be more efficient in your case, since only the new transactions has to be sorted (the target table are already sorted.

Keeping a table sorted is usually good for performance especially when the sort order is often used as search criteria, and has an index defined on it. But you need to try to calculate so that the resources used to keep a table sorted is less what you gain when querying the table.

/Linus

Data never sleeps
Warri
Calcite | Level 5

Not what I hoped for, but I cannot find something on the web that contradicts your answer. So thanks and I suppose we just have to increase server capacity Smiley Wink.

Ksharp
Super User

Agree with LinusH. As far as I know there is not such proc to match your demand.

I think using proc append and proc sort is the best way.

If you create an index, It will also waste a lot of time for large table( almost like proc sort ' time).

Maybe SAS will develop some technology to solve this problem for large table in the future. Just a think.

Ksharp

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 1783 views
  • 6 likes
  • 6 in conversation