Simplest method is DATA step with a SET statement and then new column assignments:
Data indata;
set indata;
new_column = . ;
run;
DATA step UPDATE statement reads and re-writes the whole lot.
And MODIFY statement can perform update-records-in-place, BUT not adding columns.
Since the large dataset is probably stable, I would suggest not updating it any more. Start adding the existing and new columns in a new table.
When a combination of new data and old data is needed, perform some kind of join on the relevant subsets.
When it becomes too slow to update the new table with old-fashioned brute-force, start using the SQL Update statement or the data step MODIFYstatement. For now, practise these techniques for update-in-place. Perhaps by the time you need them, you will be confident in their use, and the considerations (like making occasional back-up of the file to be update-in-place )
When analysis needs to look into all the data, these alternative approaches are worth considering
1. build a view like
data joined /view= joined ;
set old_data.set1 new_data.set2 ;
%* perhaps with
BY some logical ordering columns;
run;
Then you analyse the table JOINED.
This is suitable only for one-off analysis, because it will pass through all the data each time it is used.
2. build an SQL view concatenating all the rows :
This is better, because the SQL optimiser can pass through a where clause to the underlying tables, in a way that a data step view is unable.
3. build indexes on the old data set to allow effective subsetting without having to pass through the whole data.
Then you can take advantage of approach 2
4. Fragment the 2,500,000,000 obs into relevant subsets that align to typical reporting subsets.
5. Collect a lot of summary statistics on the "old data" .
6. use SPD server and dynamic partitioning.
First, get SAS Customer Support to help.
Good Luck
PeterC
Simplest method is DATA step with a SET statement and then new column assignments:
Data indata;
set indata;
new_column = . ;
run;
DATA step UPDATE statement reads and re-writes the whole lot.
And MODIFY statement can perform update-records-in-place, BUT not adding columns.
Since the large dataset is probably stable, I would suggest not updating it any more. Start adding the existing and new columns in a new table.
When a combination of new data and old data is needed, perform some kind of join on the relevant subsets.
When it becomes too slow to update the new table with old-fashioned brute-force, start using the SQL Update statement or the data step MODIFYstatement. For now, practise these techniques for update-in-place. Perhaps by the time you need them, you will be confident in their use, and the considerations (like making occasional back-up of the file to be update-in-place )
When analysis needs to look into all the data, these alternative approaches are worth considering
1. build a view like
data joined /view= joined ;
set old_data.set1 new_data.set2 ;
%* perhaps with
BY some logical ordering columns;
run;
Then you analyse the table JOINED.
This is suitable only for one-off analysis, because it will pass through all the data each time it is used.
2. build an SQL view concatenating all the rows :
This is better, because the SQL optimiser can pass through a where clause to the underlying tables, in a way that a data step view is unable.
3. build indexes on the old data set to allow effective subsetting without having to pass through the whole data.
Then you can take advantage of approach 2
4. Fragment the 2,500,000,000 obs into relevant subsets that align to typical reporting subsets.
5. Collect a lot of summary statistics on the "old data" .
6. use SPD server and dynamic partitioning.
First, get SAS Customer Support to help.
Good Luck
PeterC
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.
Find more tutorials on the SAS Users YouTube channel.