BookmarkSubscribeRSS Feed
vnreddy
Quartz | Level 8

Hi,

Can someone help me with below issue. 

When i use sum and group by in proc sql with or without distinct, i am not getting unique records.

As you can see from below output image, i should only get 3 rows in my output. How can i get rid of 2nd or 3rd row from my output.

data have;
input code $1-3 tcode $5-8 order_date :date9. invoice_date :date9. invoice_code $30-33 customer $35-39 Cust_code $41-44
P_code $46-50 P_name $52-60 H_code $62-65 T_code $67-72 Plant $74-77 Cat_code $79-81 Cat_desc $83 Item_code $85-87
Item_desc $89-98 Distance 100-102 Qty 104-107 HPrice 109-112 Rev 114-119;
format order_date date9. invoice_date date9.;
;
datalines;
101 1019 21NOV2023 25NOV2023 2626 B11AC B100 52486 New_B11AC EAST GNA1UC 100A 200 H 380 HPayDry    9.3    0 0.00      0
101 1019 21NOV2023 25NOV2023 2626 B11AC B100 52486 New_B11AC EAST GNA1UC 100A 200 H 370 HChargeDry 9.3 2.14 0.00  94.25
101 1019 21NOV2023 25NOV2023 2626 B11AC B100 52486 New_B11AC EAST GNA1UC 100A 200 H 370 HChargeDry 9.3 2.14 0.00 113.36
101 1019 21NOV2023 25NOV2023 2626 B11AC B100 52486 New_B11AC EAST GNA1UC 100A 114 S 520 0/2_catP3  9.3 2.14 4.35 169.38
; 

proc sql;
create table want (drop=Qty Rev) as
select*,
sum(Qty) as P_Qty,
sum(Rev) as Revenue
from have
group by cat_code, Cat_desc, item_code,item_desc
;
quit;

current output what i am getting 

vnreddy_1-1707932613603.png

 

Expected output when i use sum and group by i should get only 3 rows as per the requirement. Row 2 & 3 are same, how should i get rid of one row.

 

Sum issue : for sample purpose i have manually shown few records here, when i use the sum function on a large dataset i end getting a sum on complete data rather then group by. Below image shows an issue with sum on revenue and qty when i use a sum function and group by in proc sql. In reality it should only give the sum based on grouping. Below sum is not right, it won't be this high number.  

vnreddy_0-1707933213264.png

 

 

Thanks,

vnreddy

4 REPLIES 4
Tom
Super User Tom
Super User

You asked SAS to return ALL of the variables.  So that will require ALL of the observations.

 

If you want just the summary results and the group by variables then you need to only ask for those variables.

create table want as
  select cat_code, Cat_desc, item_code, item_desc
       , sum(Qty) as P_Qty
       , sum(Rev) as Revenue
  from have
  group by cat_code, Cat_desc, item_code,item_desc
;

PS Don't hide those continuation commas at the ends of the lines. It is much harder to scan for them there since the right side of the lines is jagged.

 

vnreddy
Quartz | Level 8

Hi @Tom 

 

Yes, I need all the variables in my output.

 

Thanks,

vnreddy

Tom
Super User Tom
Super User

Then SAS did what you wanted.

Try it with SASHELP.CLASS.

 

proc sql;
select *,mean(age) as Gender_Mean
from sashelp.class
group by sex
;
quit;

As you can see you get ALL of the observations and the new variable has the same value for all observations that share the same values of the GROUP BY variables.

Reeza
Super User
Then you need to summarize/add some logic to have them handled.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 594 views
  • 0 likes
  • 3 in conversation