Re: can we keep/drop a variable without reading the the dataset.

deleted_user · Posted 03-31-2010 10:59 AM

can we drop a variable without reading the the dataset?

just like we can rename it in proc datasets...

my dataset is just too huge ...
i am creating a dataset using array , it can have x_1 - x_2000 columns.
but not every the time all columns are populated.
i have to drop columns which are not populated.
i can maintain a list of the index of column which are populated in a macro variable.

i have too do this keep drop operation in other datasteps.
it also takes too much of time. i want to save that time.

thanks in advance

Regards
Avi Message was edited by: Avi

sbb · Posted 03-31-2010 12:11 PM

Not possible. But with a DATA step and most PROCs, do consider the KEEP= and DROP= dataset options (DATA or SET stmt or DATA= or OUT= with PROCs) for altering your SAS file either on input or output depending on your circumstances and requirements.

COMMENT: Suggest you investigate using SAS VIEWs and possibly the INDEX, depending on your data-volume and processing requirements, as an alternative, where suitable.

Scott Barry
SBBWorks, Inc.

Suggested Google advanced search argument, this topic/post:
how to drop variable site:sas.com

SAS_user · Posted 04-01-2010 08:07 AM

If a data is permanent, you can create a view using keep/drop statement.

If a data is temporary, don't bother using extra step and eliminating empty variables.

darrylovia · Posted 04-01-2010 08:15 AM

You could PROC SQL; ALTER TABLE statement

see the documentation linked here
http://support.sas.com/documentation/cdl/en/proc/61895/HTML/default/a002473671.htm
See below for a code example

proc sql;
create table work.shoes as
select *
from sashelp.shoes;
quit;

proc sql;
alter table work.shoes
drop inventory;
quit;

proc sql;
select *
from work.shoes;
quit;

deleted_user · Posted 04-01-2010 08:58 AM

hi all ,
Thanks of the reply ... Alter table is a good option .

regards,
Avi

Patrick · Posted 04-01-2010 09:15 PM

Avi

I believe Scott is just right: "Not possible".

If one thinks how native SAS data is stored (descriptor part, data part) then it also makes sense that deleting a variable means processing the data, deleting every single value and re-organising the data (http://support.sas.com/onlinedoc/913/getDoc/en/lrcon.hlp/a000776576.htm).

For the same reason renaming a variable is quick as this needs only a change in the descriptor part of the SAS data set.

As space seems to be the issue for you: Make sure you're using the data set option: options compress=yes; - or: data a (compress=yes);

It also might help to get more disk space or alternatively delete some unneeded tables (i.e. some work tables which are no more needed for further processing).

Also: Try to avoid creating the unwanted variables in first place.

HTH
Patrick

deleted_user · Posted 04-02-2010 12:46 AM

Hi All,

I also tried using alter table , it take more time to run.
previously i have compressed the DS but now i don't, as i need to work further on it
so why unecessary compress and uncompress, also i have enough diskspace to store the data.

the input file from where i am populating this DS is as follows

1, 2,23,4,5,24,6,7

2, 3, 234,2,6,712,8,4,1025,10,3

where 1st is id i.e id=1
2nd is no. of values (of index) cnt =2

23 is the index of column (basically its some other id)
4, 5 are the values i.e. x_23 =4 , y_23 = 5
x_24 = 6, y_24 = 7

similarly for second record id= 2,
cnt =3 and the 3 values are
x_234 = 2 y_234=6 ,
x_712=8, y_712=4 ,
x_1025 =10, y_1025 = 3

so this index can vary from 1 to 2000.

so i have defined a array of all the columns

array x_{2000};
array y_{2000};

infile "c:\inputfile.csv" dsd missover lrecl=32767;

input id cnt @;
do i=1 to cnt;
input id2 x_[id2] y_[id2] @;
end;

................
so can we have alternate method so that we can make columns only that we want.

Regards,
Avi

Patrick · Posted 04-02-2010 04:38 AM

Avi

It all comes down to data organisation.

Below how I think your dataset should look like. The primary key of the result data set is {id,ValuePair_Cnt}, Col_Ind is the classification variable, X and Y the analysis variables, N_ValuePairs is not needed in the output and could be dropped.

data want;
infile datalines dsd truncover;
length id ValuePair_Cnt 8;
input id N_ValuePairs Col_Ind @;
do ValuePair_Cnt=1 to N_ValuePairs;
input x y @;
output;
end;
datalines;
1, 2,23,4,5,24,6,7
2, 3, 234,2,6,712,8,4,1025,10,3
;
run;

proc print data=want noobs;
run;

Compression:
It's true that having the compress option turned on adds some processing time when reading\writen a data set but it also reduces I\O which is often the bottleneck.

HTH
Patrick

deleted_user · Posted 04-02-2010 05:32 AM

Hi patrik,

> data want;
> infile datalines dsd truncover;
> length id ValuePair_Cnt 8;
> input id N_ValuePairs Col_Ind @;
> do ValuePair_Cnt=1 to N_ValuePairs;
> input x y @;
> output;
> end;
> talines;
> 1, 2,23,4,5,24,6,7
> 2, 3, 234,2,6,712,8,4,1025,10,3
> ;
> run;
>

In your code you read col_ind only once ...
but after reading fist x,y the col_ind changes

eg..
for 1st obs .... its 1st ... 23 then 24
for 2nd obs ... its 234, 712, 1025...

I can place col_ind like this ...

input col_ind x y @;
output;

but this approach is not suitable for me as the id should be unique per row.

thanks

regards,
Avi

Patrick · Posted 04-02-2010 06:22 AM

Avi

Missed how col_ind really works.

Still: With the change you made you would get a unique composite key.

But let's assume you need the data horizontal.

I believe the following should do without creating too many unnecessary variables:

data sample;
infile datalines dsd truncover;
input id N_ValuePairs @;
do i=1 to N_ValuePairs;
input Col_Ind @;
do name='x','y';
input Value_Var @;
Name_Var=cats(name,'_',Col_Ind);
output;
end;
end;
keep id name_var value_var;
datalines;
1, 2,23,4,5,24,6,7
2, 3, 234,2,6,712,8,4,1025,10,3
3, 1,23,14,15
;
run;

proc transpose data=sample out=sample(drop=_name_) ;
by id;
id Name_Var;
var Value_VAR;
run;

proc print data=sample noobs;
run;

HTH
Patrick

Doc_Duke · Posted 04-01-2010 10:11 AM

I think that the PROC SQL ALTER statement still re-writes the data set, it just does it in the background. The documentation is not specific, but you can test it easily enough, by looking to see if there is a change in the size of the dataset through PROC CONTENTS.

I tend to agree with Steve, you are SOL on this one.

darrylovia · Posted 04-01-2010 12:53 PM

I'm afraid that Doc is correct I ran an experiment comparing a data step vs. proc sql. The PROC SQL ALTER didnot even complete.

I created a table 2K columns 1M rows.

See the SAS log for the details

16
17 data bigdata;
18
19 array num(2000);
20
21 do i=1 to 2000;
22 num=ranuni(i);
23 end;
24
25
26 do j=1 to 1000000;
27 output;
28 end;
29 run;

NOTE: The data set WORK.BIGDATA has 1000000 observations and 2002 variables.
NOTE: DATA statement used (Total process time):
real time 6:19.66
cpu time 3:50.78

30
31 data bigdata2;
32 set bigdata (drop=i j);
33 run;

NOTE: DATA statement used (Total process time):
real time 16:57.68
cpu time 5:55.24

NOTE: There were 1000000 observations read from the data set WORK.BIGDATA.
2 The SAS System 09:56 Thursday, April 1, 2010

NOTE: The data set WORK.BIGDATA2 has 1000000 observations and 2000 variables.

34
35 proc sql;
36 alter table bigdata
37 drop i ,j;
ERROR: Write to WORK.BIGDATA.DATA failed. File is full and may be damaged.
ERROR: The table WORK.BIGDATA was not altered due to an error condition.
NOTE: PROC SQL set option NOEXEC and will continue to check the syntax of statements.
38 quit;
NOTE: The SAS System stopped processing this step because of errors.
NOTE: PROCEDURE SQL used (Total process time):
real time 12:54.35
cpu time 4:11.15

Registration is open

SAS Training: Just a Click Away