BookmarkSubscribeRSS Feed
BrahmanandaRao
Lapis Lazuli | Level 10
data sales;
input city $  sales ;
datalines;
mumbai 2500
mumbai 1522
delhi  2000
delhi  3500
kolkatta 7500
kolkatta 8000
;
run;

proc sort data= sales;
by city sales;
run;

data s;
set sales;
by city;
if first.city then sales1=0;
sales+sales;
proc print;
run;





proc sql;
select city,sales ,sum(sales) as total
from sales
group by city,sales;
quit;
  

where I did wrong both datastep and proc sql 

18 REPLIES 18
Kurt_Bremser
Super User

In your data step, you sum up the wrong variable, and you do not suppress the output for each incoming observation (hint: use a subsetting IF with LAST. You have been given NUMEROUS examples for this in your previous questions).

In your SQL, you group by sales, so the summary will still happen per each observation, not by city alone. Remove sales from the SELECT and the GROUP BY.

andreas_lds
Jade | Level 19

I would not use a data step (or even proc sql), but proc summary to get the sum for each city.

proc summary data=sales nway;
  class City;
  var Sales;
  output out=city_sums(drop= _:) sum=;
run;
PaigeMiller
Diamond | Level 26
proc summary nway data=sales;
    class city; 
    var sales;
    output out=_sums_ sum=sum_sales;
run;

This would be a great time to start learning the very useful and powerful PROC SUMMARY!

 

No sorting needed. No writing your own SUM statements in a DATA step. And if you have another variable that has to be summed, you just add additional variable names into the VAR statement.

--
Paige Miller
BrahmanandaRao
Lapis Lazuli | Level 10
interviewer ask datastep how can you solve it so please correct where i did wrong
PaigeMiller
Diamond | Level 26

You could tell the interviewer that PROC SUMMARY also solves the problem, and since you know how to do it in PROC SUMMARY you don't have to write custom DATA step code to do this.

--
Paige Miller
BrahmanandaRao
Lapis Lazuli | Level 10
data sales;
input city $ sales ;
datalines;
mumbai 2500
mumbai 1522
delhi 2000
delhi 3500
kolkatta 7500
kolkatta 8000
;
run;

proc sort data= sales;
by city ;
run;

data summ;
set sales;
by city;
if first.city then total_sales=0;
total_sales+ sales;
if last.city;
proc print;
run;
I got it
Kurt_Bremser
Super User

There is a very slight optimization that can be done:

data summ;
set sales;
by city;
if first.city
then total_sales = sales;
else total_sales + sales;
if last.city;
run;

At first.city, there will now be only one assignment instead of two.

BrahmanandaRao
Lapis Lazuli | Level 10

Hi  KurtBremser 

if we are using sum statement instead of sum function we  getting problem when missing values are there data

 

Reeza
Super User
Your example data does not demonstrate any missing values so it seems like it may not reflect your actual data. If that's the case please update your question with a more accurate data example.
PaigeMiller
Diamond | Level 26

I just felt I would mention this: PROC SUMMARY handles missing values appropriately, no extra coding needed.

--
Paige Miller
Kurt_Bremser
Super User

@BrahmanandaRao wrote:

Hi  KurtBremser 

if we are using sum statement instead of sum function we  getting problem when missing values are there data

 


Maxim 4 (short version): Try It.

Sajid01
Meteorite | Level 14

Hello @BrahmanandaRao 

I am posting the updated code for use with data step. (There could be other approaches too).

 

data sales;
input city $  sales ;
datalines;
mumbai 2500
mumbai 1522
delhi  2000
delhi  3500
kolkatta 7500
kolkatta 8000
;
run;

proc sort data= sales;
by city sales;
run;

data s (drop =sales);
set sales;
by city;
if first.city then sales1=0;
sales1+sales;
if last.city then output;
run;
proc print;
run;

Only the value of last.city has to be output as we are interested in the sum.
I have used Sales1 as the accumulator variable to hold the sum.

ballardw
Super User

@BrahmanandaRao wrote:
interviewer ask datastep how can you solve it so please correct where i did wrong

My personal question for an interviewer when given a question like this would be to clarify exactly why a data step is needed.

 

Other approaches such as Proc SQL, Proc Summary/Means, Proc Tabulate and Proc Report will do the same thing, often with less headache depending on the specific sum(s) needed. So there should be a reason to be forced to use a data step solution.

 

Sometimes an interview question isn't so much can you do exactly what is mentioned but do you know enough to ask why an approach might be needed.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 18 replies
  • 1524 views
  • 6 likes
  • 7 in conversation