BookmarkSubscribeRSS Feed
jaliu
Quartz | Level 8

The multiple choice questions are very tricky and designed to test your knowledge of details. Problem is, they don't really tell you all the details and definitely don't teach you them. For example, a question is:

What information does SAS need in the DATA step to read an Excel workbook file and write it out to a SAS data set. 

 

One of the options is 'the name of the Excel worksheet to be read' and I cannot find anywhere that states whether this is necessary or not and if for example the first file will be read of none is specified. There are many questions that I could not be absolutely certain about due to lack of information. How am I supposed to know these things, and further how am I supposed to remember all this? Another problem I noticed is that I do not know when an error vs. a warning will occur and it seems like there is no logic sometimes why one will occur vs another.

11 REPLIES 11
Tom
Super User Tom
Super User

@jaliu wrote:

The multiple choice questions are very tricky and designed to test your knowledge of details. Problem is, they don't really tell you all the details and definitely don't teach you them. For example, a question is:

What information does SAS need in the DATA step to read an Excel workbook file and write it out to a SAS data set. 

 

One of the options is 'the name of the Excel worksheet to be read' and I cannot find anywhere that states whether this is necessary or not and if for example the first file will be read of none is specified. There are many questions that I could not be absolutely certain about due to lack of information. How am I supposed to know these things, and further how am I supposed to remember all this? Another problem I noticed is that I do not know when an error vs. a warning will occur and it seems like there is no logic sometimes why one will occur vs another.


Hopefully the question has more context than what you provide.

First a simple DATA step cannot read an EXCEL binary file.  Perhaps you could code an algorithm that converts the binary structure of an XLS or XLSX file into data in a single data step, but I doubt that they expect you (or anyone) to figure out how to do that.

 

You can use the XLSX engine on a libname statement to define a libref that points to an XLSX workbook.  Then to use data from that workbook you would need to know the libref you created and the sheetname of the tab in the workbook you want to read.  You can combine the libref and the sheetname into a two level SAS dataset reference by using the sheetname as the member name.

libname mybook xlsx 'mybook.xlsx';
data want;
  set mybook.sheet1;
run;

Or you could use a data step to read a delimited text file created from a sheet in an Excel workbook.  For example if you saved the sheet into a CSV file your code might look like this:

data want;
  infile 'mysheet.csv' dsd truncover firstobs=2;
  input var1 var2 :$20.  ;
run;

In this case you would need to know the name of the delimited file, the delimiter used, whether a header line is included, and the names, types, lengths and order of the variables you want to create from the columns in the original sheet.

jaliu
Quartz | Level 8

No there is no more context only other mcq options. But, the point is, the data step rules do not state whether a sheet name is required or not (or if there is a default). It's frustrating to be asked about things that I can't reasonably find answers to in the documentation or some official sas materials.

Tom
Super User Tom
Super User

Not sure what mcq means. Is this are multiple choice question?  What are the possible answer.  Test taking strategy for multiple choice questions is eliminate the impossible answers.  Studying strategy when looking at a sample questions is to review each choice and explain why it IS or it IS NOT not correct.

 

Or are you allowed to type free text in response?  My general advice for test taking is when the question is unclear is to clearly state that in your answer.  If there is some simple assumption you can make (like in this case that a libref has been defined using XLSX or other engine that supports Excel files) that makes it possible to make sense of the quesiton then state that you are assuming they meant XXXX and then answer the question give that assumption.  This will show the grader that you have some understanding of the topic.

 

Whether that approach works on this exam I have no idea.

Panagiotis
SAS Employee

Again, not sure what certification book edition you are using. The one I am looking at you need to:

Go to Chapter 4, find the Reading Microsoft Excel Data with the XLSX Engine (page 47, depending on your edition). There are five pages detailing what you need to do.

 

1. You need a libname statement to reference an Excel workbook. This tells SAS where to find the Excel workbook.

 

libname x xlsx 'c:\users\...\exercise.xlsx';  *step 1 *;

 

2. Now that SAS knows about the workbook, you can use it in the DATA step using the library reference x.<worksheet> in the set statement. The SET statement reads in the Excel workbook and sheet name that you want. You then also need to write out the new table somewhere (SAS table in this example). I will write my new table work.newFile.

 

 

data work.newFile;         * Step 2: The output file. This case I am creating a SAS data set in the work library from my Excel workbook's worksheet *;
     set x.WorksheetName;  * Step 3: Excel workbook and worksheet to read in *;
    * Additional statements of stuff you want to do (filter/add columns/etc) *;
run;

 

 

Summary:

 

You need:

Step 1. A libref that references an Excel workbook

Step 2. The output file name and library you want to create the new table in

Step 3. The name of the worksheet IN the workbook from the libname statement

- Peter

 

jaliu
Quartz | Level 8

I'm not sure if people are misunderstanding my question. 

 

If I don't specify the sheet of the workbook, but for example only the workbook, does the statement still work or not? Regardless of whether or not it does/doesn't, how am I supposed to know if it's necessary if the documentation doesn't really explain this?

Panagiotis
SAS Employee

Check my previous comment. Page 47. The section is Reading Microsoft excel Data with the XLSX Engine. Then go to the second subsection. Look under Steps for Reading Excel Data

 

It specifically says:

To read the Excel workbook file, SAS must receive the following information in the DATA step:

• a libref to reference the Excel workbook to be read
• the name of the Excel worksheet that is to be read

 

Continue reading after that. It'll show you how to do it with the LIBNAME statement Try those examples slowly and carefully. Also try an example where you use the libname to reference to an Excel workbook without the worksheet name. It won't work. You have to practice all of this. You cannot read the book and remember all of this. Practice, practice practice.

 

Eventually in the DATA step you realize that you need to write OUT to a table. That's why it also needs an output libref and table name.

 

- Peter

 

 

 

jaliu
Quartz | Level 8

I totally get that practice is necessary, but it does mean that this kind of information shouldn't be provided? Is it SAS's intention to have their certified users know the details of every single statement and option available? If that's the case, why is the documentation allowed then? And is it also SAS's intention for users to discover all the idiosyncracies of the syntax by playing around with every possible combination or stumbling upon things by chance? 

Panagiotis
SAS Employee

@jaliu wrote:

I totally get that practice is necessary, but it does mean that this kind of information shouldn't be provided? Is it SAS's intention to have their certified users know the details of every single statement and option available? If that's the case, why is the documentation allowed then? And is it also SAS's intention for users to discover all the idiosyncracies of the syntax by playing around with every possible combination or stumbling upon things by chance? 


I don't particularly understand what you are asking here. No one knows every option/statement available. That's not possible. The guide teaches you a lot. You have to practice, practice practice by reading/processing data. Realize what's available, then use what you know. Google and look up what you don't. You can't just memorize everything in the book. 

 

Take the course or go through the guide and just program using the examples given. Add statements to see what happens. Go to the documentation if you are curious of more options available. For example, here you are learning about reading in Excel files. Read about it. Then go read in an Excel file and practice. Then PRINT the table. Use  WHERE statement. Add a TITLE.

 

Before this book I HIGHLY recommend finishing the programming I course to better understand the fundamentals.

 

- Peter

 

ballardw
Super User

@jaliu wrote:

I'm not sure if people are misunderstanding my question. 

 

If I don't specify the sheet of the workbook, but for example only the workbook, does the statement still work or not? Regardless of whether or not it does/doesn't, how am I supposed to know if it's necessary if the documentation doesn't really explain this?


Would which specific statement "work"?

ballardw
Super User

@jaliu wrote:

No there is no more context only other mcq options. But, the point is, the data step rules do not state whether a sheet name is required or not (or if there is a default). It's frustrating to be asked about things that I can't reasonably find answers to in the documentation or some official sas materials.


Please limit use of not-everybody-uses TLAs (three letter acronyms) MCQ options? You made that sound like it is something that should be in SAS documentation instead of "multiple choice question" responses.

 

A DATA STEP does not directly read Excel XLS or XLSX files normally. As @Tom said you could possibly write a custom way to convert BINARY information of XLS (nasty nasty amount of work needed), or in the case of XLSX, uncompress the file and then separately read one of the text XML pages. That would not be reading the XLSX though but a created text file.

 

The data step would require an INFILE statement to read anything from an external file. So you need to find INFILE and EXCEL (or xls or xlsx) . Which you won't. The associated INPUT statement would be reading text generally. Which, believe it or not, is not actually Excel.

 

What these questions tell me is that you likely do not have enough experience using SAS to be taking exams for certification. Work on the random questions that people ask on this forum for experience. Then compare results to accepted solutions.

 

Please do not expect one time through a practice exam to teach enough to pass a certification exam.

Kurt_Bremser
Super User

The only feasible way to use a data step for Excel files is assigning a libname to the workbook and using a sheet as a quasi-dataset (quasi because spreadsheets miss most of the attributes of a dataset or DBMS table).

 

Since a SET, MERGE, MODIFY or UPDATE statement (the only statements for reading datasets) or the DATASET option in a DECLARE HASH statement all do need a dataset name, it is obvious that you need to supply the sheet name there.

 

There will be questions that test your ability to combine single facts you learned (or retrieve from the documentation) into a greater whole.

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

LIBNAME 101

Follow along as SAS technical trainer Dominique Weatherspoon expertly answers all your questions about SAS Libraries.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 11 replies
  • 1606 views
  • 7 likes
  • 5 in conversation