BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
A_SAS_Man
Pyrite | Level 9

I have an existing data set ("test" in my example below) where indexing and sorting are very costly tasks in terms of time due to size. I am attempting to add a column (that exists on "test2" below) to that existing data set without losing the sorting and indexing that is done on them. A demonstration of the issue below. First I set up the problem by creating the analogous data.

 

data test;
 input c1 c2 c3 c4;
 datalines;
 1 3 5 2
 3 5 1 7
 2 6 8 9
 1 2 7 3
 6 1 9 1
 9 1 3 6
 4 1 1 7
 ;
 run;

 proc sort data=test;
 by c1 c2 c3;
 run;

 Proc Datasets library = work nolist;
modify test;
index create comp=(c1 c2);
run;

/*Verify Index and Sort*/ proc contents data=test varnum; data test2; input c1 c2 c3 newc $; datalines; 1 3 5 r 3 5 1 f 2 6 8 s 1 2 7 q 6 1 9 q 9 1 3 w 4 1 1 i ; run; proc sort data=test2; by c1 c2 c3; run;
/*Verify Sort*/ proc contents data=test2 varnum;

Notice that test is both sorted and indexed, and test2 is sorted. When I do a normal joining method like below it loses all the metadata around the indexing and sorting for test. How can I get the result below but where I would retain the sorting and indexing?

 

proc sql;
create table test as 
	select a.*,
		   b.newc
	from test a left join test2 b on
		 (a.c1=b.c1)
		 And
		 (a.c2=b.c2)
		 And
		 (a.c3=b.c3)
;
quit;

I have been attempting to use some version of proc sql's alter table, but am not sure how to actually add the column. I seem to be able to create an empty column, but not populate it with the data from test2. I would also be very interested in any data step solutions to this issue.

 

1 ACCEPTED SOLUTION

Accepted Solutions
A_SAS_Man
Pyrite | Level 9

So I have found a decent workaround, not sure of all of the shortcomings of it yet but if others have thoughts on potential pitfalls/problems this could create I would be interested in hearing them.

 

proc sql;
alter table test
	add newc char format=$6.;
quit;

data test;
modify test test2;
by c1 c2 c3;
run;

This seemingly populates the newc column with the values from test2.

View solution in original post

7 REPLIES 7
Tom
Super User Tom
Super User

There is a simple answer to this. Don't do that.

If you are adding a variable to a dataset then SAS will have to totally remake the dataset. No way around it.

 

Can you just get away with adding the variable on the fly with a merge or join?

proc sql ;
create view new_data as
select a.*,b.new_var
from big_dataset a 
left join new_dataset b
on a.key = b.key
;
quit;
A_SAS_Man
Pyrite | Level 9

Edited for clarity.

 

No this does not get at what I need. I am aware of ways to append data to an existing set and retain the sorted flag and indexing (proc append) and ways to calculate a new column in a data step with the modify option. Because of these options seemingly allowing for updating a data set in place I was hoping there would be away to add a column from another data set similarly. 

A_SAS_Man
Pyrite | Level 9

So I have found a decent workaround, not sure of all of the shortcomings of it yet but if others have thoughts on potential pitfalls/problems this could create I would be interested in hearing them.

 

proc sql;
alter table test
	add newc char format=$6.;
quit;

data test;
modify test test2;
by c1 c2 c3;
run;

This seemingly populates the newc column with the values from test2.

Tom
Super User Tom
Super User

That should work. Note that SAS will still need to rebuild the dataset and indexes.

The drawback is that now your table definition no longer matches the code that used to create the table originally. 

A_SAS_Man
Pyrite | Level 9

So when I run a proc contents on my data set after the above method the indexes/sorting are still there, but are you saying that both of those things are being recalculated when I run a modify statement rather than carrying over from the previous one? This would definitely be a deal breaker when I try to do it on a larger data set (because the costly indexing/sorting is what I was trying to avoid) so I will have to do some testing.

Tom
Super User Tom
Super User

As long as it completes without error the time it takes probably shouldn't matter.

This is NOT the type of activity you should be doing very often (if ever).

A_SAS_Man
Pyrite | Level 9

If it doesn't save processing time then there wouldn't be a reason for me to do this, so that is actually my primary concern (beyond potential for error  I guess). Can you expand more on why this shouldn't be done from your point of view?

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 1066 views
  • 0 likes
  • 2 in conversation