Is there a way to keep or drop variables from a dataset without using a datastep?
Let's assume I know I have a dataset (might be huge) and want to copy it to a new location with just a few variables.
How would I do this most efficiently, i.e. using minimal computing ressources?
Example:
proc copy in = sashelp
out = work
memtype = data;
select cars;
run;
That will copy all variables, so in this case
Variable list as text:
Make | Origin | Invoice |
Horsepower |
Weight |
Model | DriveTrain |
EngineSize |
MPG_City | Wheelbase |
Type | MSRP |
Cylinders |
MPG_Highway | Length |
Usually I would do this:
data cars_few_vars;
set cars (keep = make model mpg_city);
run;
Without creating a new dataset I can think of this:
proc sort data = cars (keep = make model mpg_city);
by make;
run;
I think it should be possible to work on a higher level than reading the dataset into memory (datastep, proc sort).
Since datasets are stored one observation after another, with each observation containing all values for all defined variables, you must always read the whole dataset physically, even when only some variables are kept.
To reduce the size of the target dataset, the data step is the most efficient tool; if you need to sort anyway, dataset options are the way to go.
I'm not sure you will see any savings, but you might. Test using a view:
libname somewhere 'path to destination';
data cars / view=cars;
set sashelp.cars (keep=make model mpg_city);
run;
proc copy in = work
out = somewhere;
select cars;
run;
(Wish I could test it for you but I can't.)
Nice point - I haven't thought about views.
However I am hoping to find a binary answer like
I look for this kind of confirmation/ correction in my question.
From proc datasets procedure I know that the copy option does not support data set options.
The same applies to the proc copy procedure.
As we have the metadata in the dictionaries I was thinking there should be a way of copying a dataset and use filtering methods (keep/ drop) on a higher level than actually reading the whole data.
Not sure what you mean about "reading data into memory". SAS does not store datasets in memory. It only stores the observations in memory.
To read a dataset you need to read the whole dataset. Unless you want to skip some observations by using the FIRSTOBS= and/or OBS= (really means lastobs) dataset options.
The KEEP= and/or DROP= dataset options are probably the easiest ways to select which variables to copy.
data outds;
set inds(keep=id var1 var5);
run;
But you could also generate a simple PROC SQL select statement to select (and potentially re-order) the variables you want.
proc sql;
create table outds as
select id,var5,var1
from inds
;
quit;
Since datasets are stored one observation after another, with each observation containing all values for all defined variables, you must always read the whole dataset physically, even when only some variables are kept.
To reduce the size of the target dataset, the data step is the most efficient tool; if you need to sort anyway, dataset options are the way to go.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.