Hello everyone,
I have some questions about the following code and its answer choice. This is a problem of sas practice for base exam.
The right answer is A.
(1) In the user-defined format, the right side of "=" is some date formats, which are put inside
of []. Is this true for all the cases of any format on the right?
(2) I thought this program subsets all the observations by year 1980 and later, not 1991.
The second where statement overwrites the first one. Any comments? Thanks.
title; proc format; value yrgroup low - '31dec1991'd = [year.] '01jan1992'd - high = [monyy.] ; run; proc print data=SASHelp.retail noobs; where date > '01Jan1991'd; var date sales; format date yrgroup.; where date > '01Jan1980'd; run;
The answer key appears to be just plain wrong.
If two WHERE clauses are specified, SAS takes the most recent one and replaces any earlier WHERE clauses.
I just ran your code and got this message in my log:
36 proc print data=SASHelp.retail noobs; 37 where date > '01Jan1991'd; 38 var date sales; 39 format date yrgroup.; 40 where date > '01Jan1980'd; NOTE: WHERE clause has been replaced. 41 run;
In other words, your data will be subsetted by 01JAN1980 not 01JAN1991.
Jim
Results are below. Correct me if I am wrong, but I believe that I see data from January 1980 onward.
The answer key appears to be just plain wrong.
If two WHERE clauses are specified, SAS takes the most recent one and replaces any earlier WHERE clauses.
I just ran your code and got this message in my log:
36 proc print data=SASHelp.retail noobs; 37 where date > '01Jan1991'd; 38 var date sales; 39 format date yrgroup.; 40 where date > '01Jan1980'd; NOTE: WHERE clause has been replaced. 41 run;
In other words, your data will be subsetted by 01JAN1980 not 01JAN1991.
Jim
Results are below. Correct me if I am wrong, but I believe that I see data from January 1980 onward.
Yes, when in doubt, run the code. Actual code trumps all answer keys. 🙂
I suppose this topic is now "solved," at least in the sense that we know what's going on. Far better to know what's going on than to try to talk yourself into accepting the answer key's solution and thereby mis-learning SAS.
Jim
The code as posted runs without errors. There are some NOTES produced by both steps.
188 proc format; 189 value yrgroup low - '31dec1991'd = [year.] NOTE: The YEAR (in)format was specified on the right-hand side of an equal sign, but without a length specification. PROC FORMAT will assume a default length of at least 40 for the format being generated. If this is an insufficient width, you can rerun PROC FORMAT with an explicit width for the YEAR (in)format, or provide a sufficient DEFAULT= option. 190 '01jan1992'd - high = [monyy.] NOTE: The MONYY (in)format was specified on the right-hand side of an equal sign, but without a length specification. PROC FORMAT will assume a default length of at least 40 for the format being generated. If this is an insufficient width, you can rerun PROC FORMAT with an explicit width for the MONYY (in)format, or provide a sufficient DEFAULT= option. 191 ; NOTE: Format YRGROUP has been output. 192 run; NOTE: PROCEDURE FORMAT used (Total process time): real time 0.09 seconds cpu time 0.03 seconds 193 194 proc print data=SASHelp.retail noobs; 195 where date > '01Jan1991'd; 196 var date sales; 197 format date yrgroup.; 198 where date > '01Jan1980'd; NOTE: WHERE clause has been replaced. 199 run; NOTE: There were 57 observations read from the data set SASHELP.RETAIL. WHERE date>'01JAN1980'D; NOTE: PROCEDURE PRINT used (Total process time): real time 0.21 seconds cpu time 0.06 seconds
Let's look at each answer and see how it stacks up to reality. Then perhaps we can pick a least bad answer.
Let's do in reverse order since they claim you should pick A.
D The PRINT procedure will fail as there are conflicting WHERE statements.
There is no such things as conflicting WHERE statements. The last one cancels the earlier ones. You can see that in the notes in the SAS log.
C The FORMAT procedure will fail to create a format and thus the PRINT procedure will fail as the format cannot be found.
The is no obvious error with the format step. The only issue is that the nested format specifications do not have widths specified. But as you can see from the notes in the log that does not stop the format from being created. And a data step will only fail if the format is not found if you have set the FMTERR option. They do not mention anything about that being set.
B The FORMAT procedure will fail but the PRINT procedure will summarize by date disregarding the FORMAT statement.
So this is like C but they say that PRINT will ignore the missing format. So that part is right. But the first part is wrong. But the PROC PRINT step does not "summarize" by anything. Perhaps if you included a SUM statement for one or more of the variables in the VAR statement it might show totals for that variable. Or perhaps if you add the N option to the PROC PRINT statement it would show a count of the number of observations.
A The report generated by the PRINT procedure will summarize sales by year through 1991 and after by month and year.
This one is a little closer to the truth. The PROC PRINT output will DISPLAY the date values as just the YEAR for dates before 1992 and as 3 letter month and year for the later dates. But again the PROC PRINT is not "summarizing" anything.
Part of the output. You can see that there appears to be four observations for each year. Based on the later values apparently each on is for a date that is in a different month of that year.
So A is the best answer as it is the only one that describes what the format does (or even attempts to describe it). So if I had to pick one I would choose A. Is this strictly a multiple choice test? Or are you allowed to explain why you picked a given answer?
If you wanted to summarize by DATE then use another procedure. Like PROC MEANS.
proc means data=sashelp.retail ;
where date > '01Jan1990'd;
where also date < '01JAN1993'd;
format date yrgroup.;
class date;
var sales;
run;
The MEANS Procedure Analysis Variable : SALES Retail sales in millions of $ N DATE Obs N Mean Std Dev Minimum Maximum ------------------------------------------------------------------------------ 1990 3 3 709.3333333 37.6873100 674.0000000 749.0000000 1991 4 4 736.7500000 48.0303723 703.0000000 807.0000000 JAN1992 1 1 692.0000000 . 692.0000000 692.0000000 APR1992 1 1 797.0000000 . 797.0000000 797.0000000 JUL1992 1 1 826.0000000 . 826.0000000 826.0000000 OCT1992 1 1 889.0000000 . 889.0000000 889.0000000 ------------------------------------------------------------------------------
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.