BookmarkSubscribeRSS Feed
deleted_user
Not applicable
can we drop a variable without reading the the dataset?

just like we can rename it in proc datasets...

my dataset is just too huge ...
i am creating a dataset using array , it can have x_1 - x_2000 columns.
but not every the time all columns are populated.
i have to drop columns which are not populated.
i can maintain a list of the index of column which are populated in a macro variable.

i have too do this keep drop operation in other datasteps.
it also takes too much of time. i want to save that time.

thanks in advance

Regards
Avi Message was edited by: Avi
11 REPLIES 11
sbb
Lapis Lazuli | Level 10 sbb
Lapis Lazuli | Level 10
Not possible. But with a DATA step and most PROCs, do consider the KEEP= and DROP= dataset options (DATA or SET stmt or DATA= or OUT= with PROCs) for altering your SAS file either on input or output depending on your circumstances and requirements.

COMMENT: Suggest you investigate using SAS VIEWs and possibly the INDEX, depending on your data-volume and processing requirements, as an alternative, where suitable.

Scott Barry
SBBWorks, Inc.

Suggested Google advanced search argument, this topic/post:
how to drop variable site:sas.com
SAS_user
Calcite | Level 5
If a data is permanent, you can create a view using keep/drop statement.

If a data is temporary, don't bother using extra step and eliminating empty variables.
darrylovia
Quartz | Level 8
You could PROC SQL; ALTER TABLE statement

see the documentation linked here
http://support.sas.com/documentation/cdl/en/proc/61895/HTML/default/a002473671.htm
See below for a code example


proc sql;
create table work.shoes as
select *
from sashelp.shoes;
quit;


proc sql;
alter table work.shoes
drop inventory;
quit;

proc sql;
select *
from work.shoes;
quit;
deleted_user
Not applicable
hi all ,
Thanks of the reply ... Alter table is a good option .

regards,
Avi
Patrick
Opal | Level 21
Avi

I believe Scott is just right: "Not possible".

If one thinks how native SAS data is stored (descriptor part, data part) then it also makes sense that deleting a variable means processing the data, deleting every single value and re-organising the data (http://support.sas.com/onlinedoc/913/getDoc/en/lrcon.hlp/a000776576.htm).

For the same reason renaming a variable is quick as this needs only a change in the descriptor part of the SAS data set.

As space seems to be the issue for you: Make sure you're using the data set option: options compress=yes; - or: data a (compress=yes);

It also might help to get more disk space or alternatively delete some unneeded tables (i.e. some work tables which are no more needed for further processing).

Also: Try to avoid creating the unwanted variables in first place.

HTH
Patrick
deleted_user
Not applicable
Hi All,

I also tried using alter table , it take more time to run.
previously i have compressed the DS but now i don't, as i need to work further on it
so why unecessary compress and uncompress, also i have enough diskspace to store the data.

the input file from where i am populating this DS is as follows

1, 2,23,4,5,24,6,7

2, 3, 234,2,6,712,8,4,1025,10,3

where 1st is id i.e id=1
2nd is no. of values (of index) cnt =2

23 is the index of column (basically its some other id)
4, 5 are the values i.e. x_23 =4 , y_23 = 5
x_24 = 6, y_24 = 7

similarly for second record id= 2,
cnt =3 and the 3 values are
x_234 = 2 y_234=6 ,
x_712=8, y_712=4 ,
x_1025 =10, y_1025 = 3

so this index can vary from 1 to 2000.

so i have defined a array of all the columns

array x_{2000};
array y_{2000};

infile "c:\inputfile.csv" dsd missover lrecl=32767;

input id cnt @;
do i=1 to cnt;
input id2 x_[id2] y_[id2] @;
end;

................
so can we have alternate method so that we can make columns only that we want.

Regards,
Avi
Patrick
Opal | Level 21
Avi

It all comes down to data organisation.

Below how I think your dataset should look like. The primary key of the result data set is {id,ValuePair_Cnt}, Col_Ind is the classification variable, X and Y the analysis variables, N_ValuePairs is not needed in the output and could be dropped.

data want;
infile datalines dsd truncover;
length id ValuePair_Cnt 8;
input id N_ValuePairs Col_Ind @;
do ValuePair_Cnt=1 to N_ValuePairs;
input x y @;
output;
end;
datalines;
1, 2,23,4,5,24,6,7
2, 3, 234,2,6,712,8,4,1025,10,3
;
run;

proc print data=want noobs;
run;


Compression:
It's true that having the compress option turned on adds some processing time when reading\writen a data set but it also reduces I\O which is often the bottleneck.

HTH
Patrick
deleted_user
Not applicable
Hi patrik,


> data want;
> infile datalines dsd truncover;
> length id ValuePair_Cnt 8;
> input id N_ValuePairs Col_Ind @;
> do ValuePair_Cnt=1 to N_ValuePairs;
> input x y @;
> output;
> end;
> talines;
> 1, 2,23,4,5,24,6,7
> 2, 3, 234,2,6,712,8,4,1025,10,3
> ;
> run;
>

In your code you read col_ind only once ...
but after reading fist x,y the col_ind changes

eg..
for 1st obs .... its 1st ... 23 then 24
for 2nd obs ... its 234, 712, 1025...

I can place col_ind like this ...

input col_ind x y @;
output;

but this approach is not suitable for me as the id should be unique per row.

thanks

regards,
Avi
Patrick
Opal | Level 21
Avi

Missed how col_ind really works.

Still: With the change you made you would get a unique composite key.

But let's assume you need the data horizontal.

I believe the following should do without creating too many unnecessary variables:

data sample;
infile datalines dsd truncover;
input id N_ValuePairs @;
do i=1 to N_ValuePairs;
input Col_Ind @;
do name='x','y';
input Value_Var @;
Name_Var=cats(name,'_',Col_Ind);
output;
end;
end;
keep id name_var value_var;
datalines;
1, 2,23,4,5,24,6,7
2, 3, 234,2,6,712,8,4,1025,10,3
3, 1,23,14,15
;
run;

proc transpose data=sample out=sample(drop=_name_) ;
by id;
id Name_Var;
var Value_VAR;
run;

proc print data=sample noobs;
run;

HTH
Patrick
Doc_Duke
Rhodochrosite | Level 12
I think that the PROC SQL ALTER statement still re-writes the data set, it just does it in the background. The documentation is not specific, but you can test it easily enough, by looking to see if there is a change in the size of the dataset through PROC CONTENTS.

I tend to agree with Steve, you are SOL on this one.
darrylovia
Quartz | Level 8
I'm afraid that Doc is correct I ran an experiment comparing a data step vs. proc sql. The PROC SQL ALTER didnot even complete.

I created a table 2K columns 1M rows.

See the SAS log for the details

16
17 data bigdata;
18
19 array num(2000);
20
21 do i=1 to 2000;
22 num=ranuni(i);
23 end;
24
25
26 do j=1 to 1000000;
27 output;
28 end;
29 run;

NOTE: The data set WORK.BIGDATA has 1000000 observations and 2002 variables.
NOTE: DATA statement used (Total process time):
real time 6:19.66
cpu time 3:50.78


30
31 data bigdata2;
32 set bigdata (drop=i j);
33 run;

NOTE: DATA statement used (Total process time):
real time 16:57.68
cpu time 5:55.24

NOTE: There were 1000000 observations read from the data set WORK.BIGDATA.
2 The SAS System 09:56 Thursday, April 1, 2010

NOTE: The data set WORK.BIGDATA2 has 1000000 observations and 2000 variables.

34
35 proc sql;
36 alter table bigdata
37 drop i ,j;
ERROR: Write to WORK.BIGDATA.DATA failed. File is full and may be damaged.
ERROR: The table WORK.BIGDATA was not altered due to an error condition.
NOTE: PROC SQL set option NOEXEC and will continue to check the syntax of statements.
38 quit;
NOTE: The SAS System stopped processing this step because of errors.
NOTE: PROCEDURE SQL used (Total process time):
real time 12:54.35
cpu time 4:11.15

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 11 replies
  • 13742 views
  • 1 like
  • 6 in conversation