Hello,
Sorry to be vague with this inquiry.
I am working my way through the 27 main objectives - and expanded objectives - for the exam.
Up until now, the specific intent of the stated exam objectives have been clear or became clear with some digging.
Random examples:
"Use the DROP= and KEEP= data set options to specify columns to be processed and/or output"
"Use the SAS/ACCESS EXCEL or PCFILES engine to read an .xlsx file OR use the SAS/ACCESS XLSX engine to read an .xlsx file"
These, and most other stated objectives are fairly - if not crystal - clear.
But I am stuck understanding precisely what is meant for this objective and expanded objective:
"Accumulate sub-totals and totals using DATA step statements": Do they mean aggregate functions ? (sum, count, avg, etc.)
"Use the BY statement to aggregate by subgroups": Do they mean First.variable Last.Variable ?
At this point, I'm not asking to understand anything specific programmatically about aggregate functions, first.variable etc. I'm just looking for clarity about the intent of this particular exam objective.
Greatly appreciate any insight
thanks
You aren't using the SUM function, SUM(), you're using a SUM statement which is different.
I suspect they're expecting you to be able to generate summaries using a data step as well as proc means.
For example, for the Class data set, create three extra rows, one that has the total or mean for males, one for females and one for overall.
So data set should be:
Female records
Female total
Male records
Male Total
Grand Totals.
This in particular isn't a useful example, but understanding how to carry variables values over rows and reset them is the key points that are used very often.
@Rampsas1 wrote:
Hello,
Sorry to be vague with this inquiry.
I am working my way through the 27 main objectives - and expanded objectives - for the exam.
Up until now, the specific intent of the stated exam objectives have been clear or became clear with some digging.
Random examples:
"Use the DROP= and KEEP= data set options to specify columns to be processed and/or output"
"Use the SAS/ACCESS EXCEL or PCFILES engine to read an .xlsx file OR use the SAS/ACCESS XLSX engine to read an .xlsx file"
These, and most other stated objectives are fairly - if not crystal - clear.
But I am stuck understanding precisely what is meant for this objective and expanded objective:
"Accumulate sub-totals and totals using DATA step statements": Do they mean aggregate functions ? (sum, count, avg, etc.)
"Use the BY statement to aggregate by subgroups": Do they mean First.variable Last.Variable ?
At this point, I'm not asking to understand anything specific programmatically about aggregate functions, first.variable etc. I'm just looking for clarity about the intent of this particular exam objective.
Greatly appreciate any insight
thanks
Hi:
@Reeza is exactly correct. Accumulate and Aggregate have different meanings -- the objective uses accumulate. In this case, instead of aggregate functions like sum and mean, we do mean to use statements such as retain and first. and last. by variables to accumulate running totals and calculate totals for by groups.
For example:
Start with this data:
name qtr amt
alan 1 15
alan 3 20
alan 4 15
barb 1 10
barb 2 10
barb 3 10
barb 4 10
Generate this in a data step:
name qtr amt running_tot
alan 1 15 15
alan 3 20 35
alan 4 15 50
barb 1 10 10
barb 2 10 20
barb 3 10 30
barb 4 10 40
OR generate this in a data step:
name numqtr totamt
alan 3 50
barb 4 40
We show you these techniques in Programming 1 and Programming 2. You would use the RETAIN statement, the SUM statement and BY group processing.
Cynthia
Thanks so much Cynthia and Reeza.
RETAIN was the key word I was looking for.
I believe I understand this, but always appreciate feedback from the experts to know if there are better or different ways.
In my examples below, I'm calling "running_tot + amt" an "alias sum function with implicit retain" to attempt to clarify when we can 'opt out of' using retain explicitly. What might be the official name for this sum with implicit retain? (I also ran across examples using 'if-then-do-else' and thought I'd apply those as well)
/* answer 1a - using retain + sum function*/
data test_1a;
set test;
retain running_tot;
running_tot=sum(running_tot,amt);
run;
proc print data=test_1a;
run;
/* answer 1b - using "alias sum function with implicit retain" */
data test_1b;
set test;
running_tot + amt;
run;
proc print data=test_1b;
run;
/* assuming data might not be properly sorted*/
PROC SORT data=test;
by name;
run;
/* option 2a - using retain + sum function + by (first / last)*/
data test_2a (keep=name numqtr totamt);
set test;
put _all_;
BY name;
RETAIN numqtr totamt;
if first.name then totamt=amt;
else totamt=sum(totamt,amt);
if first.name then numqtr=1;
else numqtr=numqtr+1;
if last.name then output;
put _all_;
PUT ' ';
run;
proc print data=test_2a;
run;
/* answer 2b - same as 2a but using "if then do else" */
data test_2b (keep=name numqtr totamt);
set test;
put _all_;
BY name;
RETAIN numqtr totamt;
if first.name then do;
totamt=amt;
numqtr=1;
end;
else if first.name ne 1 then do;
totamt=sum(totamt,amt);
numqtr=numqtr+1;
end;
if last.name then output;
put _all_;
PUT ' ';
run;
proc print data=test_2b;
run;
/* option 2c - same as 2b (var order is different) but using "alias sum function with implicit retain" */
data test_2c (keep=name numqtr totamt);
set test;
put _all_;
BY name;
if first.name then do;
totamt=amt;
numqtr=1;
end;
else if first.name ne 1 then do;
totamt+amt;
numqtr+1;
end;
if last.name then output;
put _all_;
PUT ' ';
run;
proc print data=test_2c;
run;
You aren't using the SUM function, SUM(), you're using a SUM statement which is different.
thanks for the clarification.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.