Adding new column to a dataset

Reply
Contributor
Posts: 42

Adding new column to a dataset

Hi,

I have a sas dataset with around 4000 million records ,  26 columns.

I need to add 2 new char columns with just blanks..

Are the below statements efficient?

proc sql;

alter table libname.tablename add colName1 char(3) format=$3.;

alter table libname.tablename add colName2 char(3) format=$3.;

quit;

If not, is there a better way to add columns to alarge dataset such that it doesn't take huge time ?

Thanks!

Super User
Posts: 5,256

Re: Adding new column to a dataset

If you know that you want to add two column, have them added in on pass of the data. Two alter table statements will do this in two separate passes.

Do you have any indexes or constraints on the table?

is it stored in Base SAS libname?

Data never sleeps
Contributor
Posts: 42

Re: Adding new column to a dataset

Hi,

It has 5 indexes..

Engine is SASSPDS.


Super User
Super User
Posts: 7,401

Re: Adding new column to a dataset

Not sure what you are trying to achieve here, but as a suggestion, if you have a large table like that and some of that data will have a flag say, then why not create a secondary table with just the id variables and the flag, and fill that out as necessary and join it on when needed further in the program.  It is of the RDBMS form, so keeping the minimum amount of information necessary, i.e. that additional table is going to be empty until some data is derived thus is minimum.  If you add two columns on then the size increases as all rows would have those additional variables on.

Super User
Posts: 5,256

Re: Adding new column to a dataset

Ok, quite important info...

The table is not a member of cluster, or is a cluster itself?

Try the alter table statement (just one!), and put it in an explicit SQL pass-thru block.

Can't guarantee that this is the most efficient option. Try it out on a smaller temporary subset of the original table (with indexes).

Data never sleeps
Contributor
Posts: 42

Re: Adding new column to a dataset

The table isn't a cluster table or a member of cluster table.

I tried Alter table and after 6 hours got insufficient disk space.   Smiley Sad

Valued Guide
Posts: 2,175

Re: Adding new column to a dataset

If you don't have the capacity to add the column to the data, just add it in the process which reads that data (the easy way is to create a VIEW - and the best performance would be achieved with an SQL view)

Super User
Posts: 6,936

Re: Adding new column to a dataset

You can do it in a data step if you just have a "simple" data set without indexes, constraints or other:

data libname.tablename;

set libname.tablename;

format colName1 colName2 $3.;

retain colName1 " " colName2 " "; * avoids "unitialized" notes in the log;

run;

Any method has to rewrite the whole data set (because the structure of the records changes), so you will mostly be bound by I/O.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Valued Guide
Posts: 3,208

Re: Adding new column to a dataset

check the sasspds reference. https://support.sas.com/documentation/cdl/en/engspde/64755/PDF/default/engspde.pdf

You still need to think about the physical effects and performance. It does not free you of that headache.

---->-- ja karman --<-----
Valued Guide
Posts: 3,208

Re: Adding new column to a dataset

Of course you get into this issues you have to think on the physical effects.

What I deduce:

- 4000M records 28 variables is about 400Gb or more of data. 

- I do not see the OS Unix/Windows there are technical differences

  You need twice that size for replacing the dataset. I do not expect you have that lot of free space at the SASSPDS setup

  Filling reading 1Tb of data can easily  give 6 hours of processing dependent on your IO-speed.

What you can do is unload,  to an other location clean-up and reload, with all needed additional definitions.

There is time-schedule backup.recovery etc needed. If you can activate a new location instead the old it is a renaming switch.

---->-- ja karman --<-----
Ask a Question
Discussion stats
  • 9 replies
  • 7435 views
  • 0 likes
  • 6 in conversation