BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Rampsas1
Obsidian | Level 7

Hello,

 

Sorry to be vague with this inquiry.

I am working my way through the 27 main objectives - and expanded objectives - for the exam.

Up until now, the specific intent of the stated exam objectives have been clear or became clear with some digging.

Random examples:

"Use the DROP= and KEEP= data set options to specify columns to be processed and/or output"

"Use the SAS/ACCESS EXCEL or PCFILES engine to read an .xlsx file OR use the SAS/ACCESS XLSX engine to read an .xlsx file"

These, and most other stated objectives are fairly - if not crystal - clear.

But I am stuck understanding precisely what is meant for this objective and expanded objective:

"Accumulate sub-totals and totals using DATA step statements": Do they mean aggregate functions ? (sum, count, avg, etc.) 

"Use the BY statement to aggregate by subgroups": Do they mean First.variable Last.Variable  ? 

 

At this point, I'm not asking to understand anything specific programmatically about aggregate functions, first.variable etc. I'm just looking for clarity about the intent of this particular exam objective.  

 

Greatly appreciate any insight

thanks

1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User

You aren't using the SUM function, SUM(), you're using a SUM statement which is different. 

 

View solution in original post

6 REPLIES 6
Reeza
Super User

 I suspect they're expecting you to be able to generate summaries using a data step as well as proc means. 

For example, for the Class data set, create three extra rows, one that has the total or mean for males, one for females and one for overall. 

 

So data set should be:

 

Female records 

Female total

Male records

Male Total

Grand Totals.

 

This in particular isn't a useful example, but understanding how to carry variables values over rows and reset them is the key points that are used very often. 

 


@Rampsas1 wrote:

Hello,

 

Sorry to be vague with this inquiry.

I am working my way through the 27 main objectives - and expanded objectives - for the exam.

Up until now, the specific intent of the stated exam objectives have been clear or became clear with some digging.

Random examples:

"Use the DROP= and KEEP= data set options to specify columns to be processed and/or output"

"Use the SAS/ACCESS EXCEL or PCFILES engine to read an .xlsx file OR use the SAS/ACCESS XLSX engine to read an .xlsx file"

These, and most other stated objectives are fairly - if not crystal - clear.

But I am stuck understanding precisely what is meant for this objective and expanded objective:

"Accumulate sub-totals and totals using DATA step statements": Do they mean aggregate functions ? (sum, count, avg, etc.) 

"Use the BY statement to aggregate by subgroups": Do they mean First.variable Last.Variable  ? 

 

At this point, I'm not asking to understand anything specific programmatically about aggregate functions, first.variable etc. I'm just looking for clarity about the intent of this particular exam objective.  

 

Greatly appreciate any insight

thanks


 

Cynthia_sas
SAS Super FREQ

Hi:
@Reeza is exactly correct. Accumulate and Aggregate have different meanings -- the objective uses accumulate. In this case, instead of aggregate functions like sum and mean, we do mean to use statements such as retain and first. and last. by variables to accumulate running totals and calculate totals for by groups.

For example:
Start with this data:
name qtr amt
alan  1  15
alan  3  20
alan  4  15
barb  1  10
barb  2  10
barb  3  10
barb  4  10

Generate this in a data step:
name qtr amt running_tot
alan  1   15    15
alan  3   20    35
alan  4   15    50
barb  1   10    10
barb  2   10    20
barb  3   10    30
barb  4   10    40

OR generate this in a data step:
name numqtr totamt
alan   3      50
barb   4      40

We show you these techniques in Programming 1 and Programming 2. You would use the RETAIN statement, the SUM statement and BY group processing.

Cynthia

Rampsas1
Obsidian | Level 7

Thanks so much Cynthia and Reeza.

RETAIN was the key word I was looking for.

I believe I understand this, but always appreciate feedback from the experts to know if there are better or different ways.

In my examples below, I'm calling "running_tot + amt" an "alias sum function with implicit retain" to attempt to clarify when we can 'opt out of' using retain explicitly. What might be the official name for this sum with implicit retain?  (I also ran across examples using 'if-then-do-else' and thought I'd apply those as well)

 

/* answer 1a - using retain + sum function*/
data test_1a;
set test;
retain running_tot;
running_tot=sum(running_tot,amt);
run;
proc print data=test_1a;
run;

/* answer 1b - using "alias sum function with implicit retain" */
data test_1b;
set test;
running_tot + amt;
run;
proc print data=test_1b;
run;

 

 

/* assuming data might not be properly sorted*/
PROC SORT data=test;
by name;
run;

/* option 2a - using retain + sum function + by (first / last)*/
data test_2a (keep=name numqtr totamt);
set test;
put _all_;
BY name;
RETAIN numqtr totamt;
if first.name then totamt=amt;
else totamt=sum(totamt,amt);
if first.name then numqtr=1;
else numqtr=numqtr+1;
if last.name then output;
put _all_;
PUT ' ';
run;
proc print data=test_2a;
run;


/* answer 2b - same as 2a but using "if then do else" */
data test_2b (keep=name numqtr totamt);
set test;
put _all_;
BY name;
RETAIN numqtr totamt;
if first.name then do;
totamt=amt;
numqtr=1;
end;
else if first.name ne 1 then do;
totamt=sum(totamt,amt);
numqtr=numqtr+1;
end;
if last.name then output;
put _all_;
PUT ' ';
run;
proc print data=test_2b;
run;


/* option 2c - same as 2b (var order is different) but using "alias sum function with implicit retain" */
data test_2c (keep=name numqtr totamt);
set test;
put _all_;
BY name;
if first.name then do;
totamt=amt;
numqtr=1;
end;
else if first.name ne 1 then do;
totamt+amt;
numqtr+1;
end;
if last.name then output;
put _all_;
PUT ' ';
run;
proc print data=test_2c;
run;
Reeza
Super User

You aren't using the SUM function, SUM(), you're using a SUM statement which is different. 

 

Rampsas1
Obsidian | Level 7

thanks for the clarification.

Rampsas1
Obsidian | Level 7
and ... sorry for the all the "put _all_" in my examples but it REALLY helped me to understand the PDV behavior

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 1386 views
  • 3 likes
  • 3 in conversation