BookmarkSubscribeRSS Feed
Prasad84
Fluorite | Level 6

Hi,

 

I think Option A is correct.Please clarify.

 

Thanks.

 

A data set stored on a network drive has the following characteristics:
14 Million observations
400 numeric variables
0 character variables of length 20
Binary compression
A DATA Step query requires only 3 character and 15 numeric variables from this data set. What is the best way to reduce computer resource utilization in this DATA Step?
A. A KEEP= data set option used on the SET Statement
B. A KEEP Statement used within the DATA Step
C. A KEEP= data set option used on the DATA Statement
D. A DROP= data set option used on the DATA Statement

8 REPLIES 8
Cynthia_sas
SAS Super FREQ
Are you using the Prep Guide or a Practice Exam? What does the answer key say?

Otherwise, you can solve this by understanding that KEEP= or DROP= dataset options on the SET, will restrict the variables that are loaded from the INPUT dataset, and thus, reduce the resources needed to load and manipulate the input data.

Any KEEP or DROP statement used within the DATA step program has no impact on the SET statement being read, so ALL the variables would be read in order to KEEP or DROP what you specified.

In a similar fashion, with the KEEP= or DROP= option on the DATA Statement, you are only impacting the OUTPUT file (not the INPUT file on the SET), so while you might save a bit by restricting the size of the OUTPUT data set, you're not saving anything on the INPUT data set, which is where you want to do your restriction. There is no point to reading in ALL the numeric and ALL the character variables for the few that you want to use.

cynthia
Prasad84
Fluorite | Level 6

Thank you. I am using practice exam and key says Option D.Is that correct?

Reeza
Super User

No. 

Cynthia_sas
SAS Super FREQ
Hi:
If this is the Pearson VUE or SAS Practice Exam, then please send mail to training@sas.com and report the question. We would need to know the exact name of the exam you took, when you bought or took the exam and the question number to track it down.

If this is a practice exam from some other company, then you should report the error to them.

Thanks,
cynthia
s_lassen
Meteorite | Level 14

I think I agree with you that A is the best option. It is better than B or C because you do not spend CPU and memory on reading a lot of data that you do not need, and because you may want to create new variables in the data step. In which case you will not have to worry about the names of your new variables clashing with the names of existing, but unwanted, variables in the table read.

And options A is also better than option D. For two reasons:

  1. It is easier to read. A DROP= option is like going to the baker's shop and listing alle the stuff you do not want. Easier to tell the baker what it actually is that you want. In other words, KEEP= is easier to read and maintain.
  2. If your input data changes (variables are dropped or added), a KEEP= option is safer: You will get a message in the log if a variable that you want has been dropped, and you will not automatically add new variables that you are not interested in.
mnjtrana
Pyrite | Level 9
I think option A is the correct Answer.
The reason being, while doing set statement in the dataset, we only are keeping the required variables, others are not read and dropped.

However in Keep statement( option-B), it reads all variables and their values for 14 million obs and it then drop all others columns apart from.those mentioned in keep statement, while creating the final dataset.

So unnecessary wastage of memory and cpu for all other options.

Cheers from India!

Manjeet
Tom
Super User Tom
Super User

The best answer is A.

 

With even partial knowledge of how data step really works you should realize that B,C,D and saying the same thing and so test taking skills should lead you pick A.

 

Perhaps the test randomizes the order of the choices and you are looking at the wrong answer key?

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 1493 views
  • 1 like
  • 7 in conversation