BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
left
Obsidian | Level 7

Is there a way to keep or drop variables from a dataset without using a datastep?

Let's assume I know I have a dataset (might be huge) and want to copy it to a new location with just a few variables. 

How would I do this most efficiently, i.e. using minimal computing ressources?

 

Example:

proc copy in      = sashelp
          out     = work
          memtype = data;
    select cars;
run;

That will copy all variables, so in this case 

left_0-1675781129658.png

Variable list as text:

Make Origin Invoice

Horsepower

Weight
Model DriveTrain

EngineSize

MPG_City Wheelbase
Type MSRP

Cylinders

MPG_Highway Length

 

Usually I would do this:

data cars_few_vars;
    set cars (keep = make model mpg_city);
run;

Without creating a new dataset I can think of this:

proc sort data = cars (keep = make model mpg_city);
    by make;
run;

I think it should be possible to work on a higher level than reading the dataset into memory (datastep, proc sort).

1 ACCEPTED SOLUTION

Accepted Solutions
Kurt_Bremser
Super User

Since datasets are stored one observation after another, with each observation containing all values for all defined variables, you must always read the whole dataset physically, even when only some variables are kept.

 

To reduce the size of the target dataset, the data step is the most efficient tool; if you need to sort anyway, dataset options are the way to go.

View solution in original post

4 REPLIES 4
Astounding
PROC Star

I'm not sure you will see any savings, but you might.  Test using a view:

libname somewhere 'path to destination';

data cars / view=cars;
   set sashelp.cars (keep=make model mpg_city);
run;
proc copy in      = work
          out     = somewhere;
    select cars;
run;

(Wish I could test it for you but I can't.)

left
Obsidian | Level 7

Nice point - I haven't thought about views. 

However I am hoping to find a binary answer like

  • You can (copy and) filter a dataset using to limit variables in a dataset without having to read the dataset into memory (and without using views) and here is how it goes...
  • You cannot do that, no way. If you want to change the variables in a dataset you must use a kind of datastep/ SQL statement/ etc. to accomplish this.

I look for this kind of confirmation/ correction in my question. 

 

From proc datasets procedure I know that the copy option does not support data set options

The same applies to the proc copy procedure.

 

As we have the metadata in the dictionaries I was thinking there should be a way of copying a dataset and use filtering methods (keepdrop) on a higher level than actually reading the whole data.

Tom
Super User Tom
Super User

Not sure what you mean about "reading data into memory".  SAS does not store datasets in memory.  It only stores the observations in memory.

 

To read a dataset you need to read the whole dataset. Unless you want to skip some observations by using the FIRSTOBS= and/or OBS= (really  means lastobs) dataset options.

 

The KEEP= and/or DROP= dataset options are probably the easiest ways to select which variables to copy. 

data outds;
  set inds(keep=id var1 var5);
run;

But you could also generate a simple PROC SQL select statement to select (and potentially re-order) the variables you want.

proc sql;
create table outds as
  select id,var5,var1
  from inds
;
quit;
Kurt_Bremser
Super User

Since datasets are stored one observation after another, with each observation containing all values for all defined variables, you must always read the whole dataset physically, even when only some variables are kept.

 

To reduce the size of the target dataset, the data step is the most efficient tool; if you need to sort anyway, dataset options are the way to go.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 1089 views
  • 1 like
  • 4 in conversation