data sales;
input city $ sales ;
datalines;
mumbai 2500
mumbai 1522
delhi 2000
delhi 3500
kolkatta 7500
kolkatta 8000
;
run;
proc sort data= sales;
by city sales;
run;
data s;
set sales;
by city;
if first.city then sales1=0;
sales+sales;
proc print;
run;
proc sql;
select city,sales ,sum(sales) as total
from sales
group by city,sales;
quit;
where I did wrong both datastep and proc sql
In your data step, you sum up the wrong variable, and you do not suppress the output for each incoming observation (hint: use a subsetting IF with LAST. You have been given NUMEROUS examples for this in your previous questions).
In your SQL, you group by sales, so the summary will still happen per each observation, not by city alone. Remove sales from the SELECT and the GROUP BY.
I would not use a data step (or even proc sql), but proc summary to get the sum for each city.
proc summary data=sales nway;
class City;
var Sales;
output out=city_sums(drop= _:) sum=;
run;
proc summary nway data=sales;
class city;
var sales;
output out=_sums_ sum=sum_sales;
run;
This would be a great time to start learning the very useful and powerful PROC SUMMARY!
No sorting needed. No writing your own SUM statements in a DATA step. And if you have another variable that has to be summed, you just add additional variable names into the VAR statement.
I already pointed out your mistake. Correct it.
You could tell the interviewer that PROC SUMMARY also solves the problem, and since you know how to do it in PROC SUMMARY you don't have to write custom DATA step code to do this.
There is a very slight optimization that can be done:
data summ;
set sales;
by city;
if first.city
then total_sales = sales;
else total_sales + sales;
if last.city;
run;
At first.city, there will now be only one assignment instead of two.
Hi KurtBremser
if we are using sum statement instead of sum function we getting problem when missing values are there data
I just felt I would mention this: PROC SUMMARY handles missing values appropriately, no extra coding needed.
@BrahmanandaRao wrote:
Hi KurtBremser
if we are using sum statement instead of sum function we getting problem when missing values are there data
Maxim 4 (short version): Try It.
Hello @BrahmanandaRao
I am posting the updated code for use with data step. (There could be other approaches too).
data sales;
input city $ sales ;
datalines;
mumbai 2500
mumbai 1522
delhi 2000
delhi 3500
kolkatta 7500
kolkatta 8000
;
run;
proc sort data= sales;
by city sales;
run;
data s (drop =sales);
set sales;
by city;
if first.city then sales1=0;
sales1+sales;
if last.city then output;
run;
proc print;
run;
Only the value of last.city has to be output as we are interested in the sum.
I have used Sales1 as the accumulator variable to hold the sum.
@BrahmanandaRao wrote:
interviewer ask datastep how can you solve it so please correct where i did wrong
My personal question for an interviewer when given a question like this would be to clarify exactly why a data step is needed.
Other approaches such as Proc SQL, Proc Summary/Means, Proc Tabulate and Proc Report will do the same thing, often with less headache depending on the specific sum(s) needed. So there should be a reason to be forced to use a data step solution.
Sometimes an interview question isn't so much can you do exactly what is mentioned but do you know enough to ask why an approach might be needed.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.