BookmarkSubscribeRSS Feed
icrandell
Obsidian | Level 7

Hello,

 

my question is about the logic behind when a procedure is controlled via options versus when it is controlled by additional lines in the procedure body. For context, I am an experienced R user who is just starting to learn SAS.

 

As an example of what I mean, consider the following from the SAS Programming for R Users course:

 

proc print data = sashelp.cars;

    var make model;

run;

 

The behavior of this proc is being controlled both by the data option and the var statement (not sure what the proper term for this is). However, it seems equally plausible that the following would be valid SAS code:

 

proc print;

    data = sashelp.cars;

    var make model;

run;

 

Or, perhaps they would both be options. My question is: what's the difference, and how would I know? Thanks in advance!

9 REPLIES 9
PaigeMiller
Diamond | Level 26

Your second SAS code is not legal and will return an error, as DATA= must be an option, rather than its own statement. DATA= is always an option in the PROC statement; it is never its own statement.

 

How would you know? You can look in the documentation, which is extensive and comprehensive. https://documentation.sas.com/?cdcId=pgmsascdc&cdcVersion=9.4_3.4&docsetId=pgmsashome&docsetTarget=h...

--
Paige Miller
icrandell
Obsidian | Level 7

I understand that the second example is invalid, and I agree that the documentation is quite extensive. My question is less about this specific example and more about the general principle of when I would use an option versus a statement.

 

I was hoping for an answer which goes something like "options have property X and statements have property Y." Perhaps this is off base, and all one can do is check the documentation for each new procedure.

PaigeMiller
Diamond | Level 26

Yeah, what's the general rule? Statements have options as well. I suppose statements control the big function of what a PROC does, such as what variables to use and which analysis to run; while options control smaller pieces of the result. But ... that's probably not specific enough to be of much use. And so, back to the documentation. Even though I have been using SAS since the Gregorian calendar was implemented, I still refer to the documentation often for exact syntax and whether or not the option I am interested in belongs in the PROC statement, or one of the statements inside the PROC.

--
Paige Miller
icrandell
Obsidian | Level 7

I certainly understand coding with a dozen tabs open for references!

 

Thank you for your answer.

ballardw
Super User

My $.02 to some extent.

Some procedures are much more complex than others and make the entire procedure in general behave similarly the option is on the proc statement. Individual Statements allow and the options there control specific subsets.

For instance Proc Reg which will allow multiple MODEL statements with different options on each. So you have a single Proc Option OUTEST to indicate the output data set of the estimates and a number of options for contents of that data set. All of the multiple models output will go into the same data set with the same contents.

 

An example that won't make much sense in reality but demonstrates the syntax:

proc reg data=sashelp.class outest=work.outest;
first: model weight= age;
second: model height=age;
third: model age = height weight /noint ;
run;
quit;

Each model's parameters are identified in the output by the label of the model and the special value -1 for a variable is used to indicate which is the dependent. Missing values indicate the variable was not independent for that model.

 

Other procedures such as Proc Tabulate can have multiple tables created for a report but using the OUT= Procedure option places the result of all of the summaries into a single output data set.

Proc Datasets given the Proc option of a library then allows modifying multiple data sets or variables in multiple data sets with a one level member name on modify and such.

 

Many of the procedures may not have this level of "need" but the structure is somewhat similar.

 

 

Cynthia_sas
SAS Super FREQ

Hi:

  I think that what you're looking for is more "big picture". Here's sort of a nutshell take on SAS some language elements from an instructor. SAS has 3 primary languages for beginners to get their heads around. There are more (I'm referring to SCL and DS2, but we're not going there yet).

 

The major pieces of SAS language that you will use as a beginner are:

  1) SAS Procedures

  2) DATA step Programs

  3) SAS Macro Facility (eventually, this will be something to use to generate the #1 and #2 code)

 

  All 3 of these can be impacted by options. SAS has several type of options:

a) global system options (specified in an options statement)

b) data set options (specified in parentheses after a data set reference)

c) statement level options (usually specified after a slash / in a statement)

 

  I describe SAS Procedures as "canned" routines that do something in a predictable manner. You want to display data, use PROC PRINT. You want to get counts and percents, use PROC FREQ (or PROC TABULATE or PROC REPORT). You want to get a linear regression, use PROC REG. You want basic descriptive statistics, use PROC MEANS. Most SAS procedures have procedures specific statements that control the behavior of the "canned" routine. Procedures will typically follow this convention:

PROC <whatever> DATA=<name of SAS data set>(any data set options);

** statements for THIS procedure;

RUN;

(some procedures end with a QUIT;)

 

Here's an example of a procedure step with a several different options:

options nocenter;
proc print data=sashelp.class(obs=10) noobs label
     style(header)={background=lightyellow};
  var name / style(data)={background=lightyellow font_weight=bold};
  var sex age height weight;
  label name='Student Name'
        sex = 'Gender';
run;

The NOCENTER option is a global option. It will cause all my output to be left justified (not centered) until I either issue another OPTIONS statement to center output or I end my SAS session. If you run this code, you should see that the PROC PRINT output is not centered.

 

DATA= is a PROC PRINT option. It is not required, but I consider the use of DATA= to be a best practice. If you don't have DATA=, then PROC PRINT looks for your last created data set to print. I like to be explicit about which data set I am sending to PROC PRINT. The (OBS=10) is a data set option, that "modifies" SASHELP.CLASS -- it limits the number of observations sent to PROC PRINT from SASHELP.CLASS. The NOOBS, LABEL and STYLE options all belong to PROC PRINT -- they all have an impact on the PROC PRINT output. NOOBS removes the default Observation Numbers that PROC PRINT would normally put into the output report. LABEL tells PROC PRINT to use the LABEL statement headers instead of the variable labels that are stored in the descriptor portion of the data set and the STYLE(HEADER) option tells PROC PRINT (and the Output Delivery System) to make the header cell background light yellow (for all destinations that support style).

 

  The PROC PRINT step code (what is between PROC PRINT and RUN statements) contains 2 VAR statements and 1 LABEL statement. The first VAR statement has a STYLE option for the NAME variable -- on the VAR statement, options are specified after a slash. The other VAR statement does not use any STYLE option, so those variables will get the default style used. The LABEL statement is specifying the alternate column headers that I want to see for this PROC PRINT step only. In order to have this LABEL statement used, I had to have an option in the PROC PRINT statement that told PROC PRINT to USE the label instead of the variable names.

 

  Now let's consider another procedure. Here's a PROC FREQ example:


options center;
proc freq data=sashelp.shoes(where=(Product="Sport Shoe")) nlevels;
  title 'Regions with Sport Shoe Sales';
  tables region / plots=freqplot;
run;
title;

Notice that this PROC FREQ code also has a DATA=option. This time I am specifying SASHELP.SHOES. I am using a different data set option (a WHERE= option) to only limit the PROC FREQ to rows that meet the condition for Sport Shoe. Then, the NLEVELS option is a specific option for PROC FREQ that can only be specified on the PROC FREQ statement. NOOBS and LABEL would not work with PROC FREQ, just as NLEVELS would not work with PROC PRINT. I have an OPTIONS statement to resume centering of the output.

 

The NLEVELS option will show me the number of unique levels for the variables I have listed in my TABLES statement.

 

Then, the TITLE statement is a global statement. I could have put it on top of the PROC FREQ statement, but I prefer to put it "inside" the PROC FREQ statements, because that is a reminder to me to "reset" it at the end of the procedure statements with a null title statement (TITLE;).

 

  Next, PROC FREQ has a TABLES statement. In this code, I want to see the counts and percents for the REGION variable. Since I've limited

 

  Back to NLEVELS for a minute. It is a handy option. If, for example, you wanted to see the number of unique levels for all your Character variables in a dataset, you could do this:


proc freq data=sashelp.shoes nlevels;
  tables _character_;
run;

  In the above example, notice that I haven't listed a variable name, I've used a special reference _character_ in order to tell PROC FREQ to give me tables on all the Character variables in my dataset. This is very useful when you are initially exploring your data.

 

  How about something with a DATA step program. Well, first, why would you need a DATA step program. Typically you will use a DATA step to either read data into SAS format, write data out of SAS into some other kind of file format, perform transformations (such as transform wide data to narrow data or vice versa) or cleanse data or merge/join data -- basic data manipulation. Let's say I want to calculate bonuses for my regional managers based on a percentage of shoe sales. I don't have a bonus amount in SASHELP.SHOES and it can't be calculated directly on SALES, because I want to subtract RETURNS from SALES before I calculate a possible bonus amount. I want to see how much it's going to cost. So first, I have to subtract RETURNS from SALES and then I have to calculate BONUS amount. But an added twist is that if the adjusted amount is UNDER 2.5 million they get a 2.5% bonus and if adjusted amount is over 2.5 million, they get a 5% bonus. So that needs some conditional logic to get the data ready for reporting.

 

  Here's some different code:

data calcbonus;
  set sashelp.shoes;
  use_adjusted = sales-returns;
run;

 

  Notice that the step starts with the keyword DATA, not the keyword PROC. The name of the data set that follows (CALCBONUS) is the name of the temporary data set that I want to create in the WORK library, from the SASHELP.SHOES dataset. In order to let SAS know that SASHELP.SHOES is the INPUT to the program, I need to use the data set name on a SET Statement.

 

  Next, I create the USE_ADJUSTED variable in an assignment statement by subtracting RETURNS from SALES for each observation in the file. But this has only created an adjusted amount on each of the 395 rows in SASHELP.SHOES. Now I have to summarize each region so I can calculate the bonus amount for the Regional Managers. So I need to summarize the data by region. There are a lot of different ways to do this. Usually I would use PROC MEANS or PROC REPORT or PROC SQL. Here's a PROC MEANS step:


ods output summary=work.sumout(drop=vname: label:);
proc means data=calcbonus sum;
  class region;
  var sales returns use_adjusted;
run;

The PROC MEANS statement has the SUM option, which explains the statistic that I want calculated for each REGION (because REGION is listed in the CLASS statement). The numeric variables that I want to be used are listed in the VAR statement. But, I also want a summarized output data set. So BEFORE my PROC MEANS, I have an ODS OUTPUT statement to capture the PROC MEANS output.

 

  The DROP= data set option is specified on the ODS OUTPUT statement. The summary object is what I want to capture and the SUMMARY=WORK.SUMOUT is the ODS OUTPUT option that tells ODS to gather all the summarized rows and write them to a temporary data set. The VNAME: and LABEL: variables that I am dropping in the DROP statement are helper variables that PROC MEANS creates to help me identify information about the summarized variables. But I don't really need them because I'm just going to run around and send WORK.SUMOUT to PROC PRINT:

proc print data=sumout label noobs;
  var region sales_sum returns_sum use_adjusted_sum;
  sum _numeric_;
  label sales_sum = 'Total Sales'
        returns_sum = 'Total Returns'
		use_adjusted_sum = 'Adjusted Amount'
        nobs='Count of Rows';
  format sales_sum returns_sum use_adjusted_sum dollar16. 
         nobs comma6.;
run;

And if I want to play around with the numbers myself, then I am done with this PROC PRINT report -- I have the adjusted amount and could just calculate projected bonus if I wanted to. Notice how I use the special reference _numeric_ in order to summarize all the numeric variables in the report on the SUMOUT data.

 

  However, I can also write a DATA step program with conditional logic to calculate projected bonus amount for me.


data final_sum;
  set sumout;
  if use_adjusted_sum le 2500000 then
     proj_bonus = use_adjusted_sum * .025;
  else if use_adjusted_sum gt 2500000 then
     proj_bonus = use_adjusted_sum * .05;
run;

In this DATA step program, I am creating another temporary data set called FINAL_SUM, which is going to read in the WORK.SUMOUT data and then based on the adjusted amount, calculate the projected bonus for each summarized Region.

 

  When this program is finished, all I need to do, if I want a final report is to use PROC PRINT again:


proc print data=final_sum label noobs;
  var region sales_sum returns_sum use_adjusted_sum proj_bonus;
  sum _numeric_;
  label sales_sum = 'Total Sales'
        returns_sum = 'Total Returns'
		use_adjusted_sum = 'Adjusted Amount'
        nobs='Count of Rows'
        proj_bonus='Projected Bonus';
  format sales_sum returns_sum use_adjusted_sum proj_bonus dollar16. 
         nobs comma6.;
run;

  In this PROC PRINT, everything is pretty much the same as the other PROC PRINT except this time I am using the temporary dataset FINAL_SUM which was calculated from the SUMOUT data.

 

This was not really a comprehensive discussion of options or statements as you originally asked about. But I hope it has given you a better context for the flow of how you might use procedure steps (PROC steps) and programs (DATA steps) in a sequence to perform data manipulation and reporting. Getting into examples of statistical analysis is outside the scope of what I can cover here, but given that you know R, learning SAS shouldn't be too hard. Especially since we have a free class for R programmers to get you started: https://support.sas.com/edu/schedules.html?ctry=us&crs=SP4R -- you will need to set up a SAS Profile or log onto your Profile before you can activate the course.

 

Hope this helps,

Cynthia

 

 

icrandell
Obsidian | Level 7

Cynthia,

 

thank you for the very detailed answer! Seeing examples of everything in context is very helpful in piecing together the general framework. I'm finding the sas for R users videos to be very helpful as well.

Tom
Super User Tom
Super User

SAS procedures are a hodge-podge developed over 30+ years by hundreds (thousands?) or developers.  So the general answer is that it works how ever that developer thought best.  And since SAS values (rightly) backward compatibility we are generally stuck with whater they first released.  Check the documentation when in doubt.

 

In general options that impact the whole procedure are specified on the PROC statement.  Statements generally let you specify the details  of what you want the procedure to do.  (Which variables to use, what type of model, etc)  Options that impact a particular statement within a procedure are usually specified after a slash / on the statement. 

icrandell
Obsidian | Level 7
I suspected that but didn't want to say it. Now I can say you did :).

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 1021 views
  • 4 likes
  • 5 in conversation