Hi,
I have a table like:
dist | terr | relid | mkt | product | TR | NR |
101 | 1 | 210 | XI_U | XIA | 100 | 0 |
101 | 2 | 211 | XI_U | XIA | 0 | 60 |
101 | 1 | 212 | XI_U | XIA | 110 | 70 |
101 | 1 | 213 | XI_U | XIA | 112 | 80 |
101 | 2 | 214 | XI_U | XIA | 115 | 91 |
I want to have the roll-ups of all the same terr and remove duplicates: For e.g. for dist 101 and terr 1 I should have only one row with TR=322 and NR=150. How do I remove duplicates and roll up the values as per the terr for the dist. In this case there should be only two rows (dist 101 and Terr1 and Dist 101 and terr2).
I tried to remove the duplicates using nodupkey in proc sort but that removes some of the products (not shown here).
Any help is greatly appreciated.
Thank you
M
One thing: if you roll up like this then you will always have one row per distinct response. If you wish to include relid in with the rest of the responses then you will get one row per terr/relid, which may not be what you're looking for? If you just wanted one row per terr then you could not include relid, as such:
data have;
input dist terr relid mkt $ product $ TR NR;
datalines;
101 1 210 XI_U XIA 100 0
101 2 211 XI_U XIA 0 60
101 1 212 XI_U XIA 110 70
101 1 213 XI_U XIA 112 80
101 2 214 XI_U XIA 115 91
;
PROC SQL;
CREATE TABLE WANT AS
SELECT dist, TERR, MKT, SUM(TR) AS TOTAL_TR, SUM(NR) AS TOTAL_NR
FROM HAVE
GROUP BY dist, TERR, MKT ;
QUIT;
Alternatively, you could count the distinct relids per terr if you wanted.
like this?
data have;
input dist terr relid mkt $ product $ TR NR;
datalines;
101 1 210 XI_U XIA 100 0
101 2 211 XI_U XIA 0 60
101 1 212 XI_U XIA 110 70
101 1 213 XI_U XIA 112 80
101 2 214 XI_U XIA 115 91
;
proc sort data = have;
by dist terr;
run;
data want;
set have;
by dist terr;
if first.dist and first.terr then do;
sum_TR=0;
sum_NR=0;
end;
sum_TR = sum_TR + TR;
sum_NR = sum_NR + NR;
if last.terr then output;
retain sum_TR sum_NR 0;
drop TR NR;
run;
It appears that you are SUMMING variables TR and NR. What values, if any of RELID, MKT or Product should be with the output?
Are you looking for a REPORT or a Dataset?
Assuming what @ballardw is saying is true in that your purpose is to sum, and you are grouping by terr, then here is a relatively simple solution in proc sql (there are many ways do this but this is one):
data have;
input dist terr relid mkt $ product $ TR NR;
datalines;
101 1 210 XI_U XIA 100 0
101 2 211 XI_U XIA 0 60
101 1 212 XI_U XIA 110 70
101 1 213 XI_U XIA 112 80
101 2 214 XI_U XIA 115 91
;
PROC SQL;
CREATE TABLE WANT AS
SELECT TERR, SUM(TR) AS TOTAL_TR, SUM(NR) AS TOTAL_NR
FROM HAVE
GROUP BY TERR;
QUIT;
I'm trying to generate a report.
My output should have dist, mkt, product, TR and NR. Don't need relid and Terr but the duplicates should be taken off by relid.
Thank you
M
By "rollup" you apparently meain you want to sum TR and NR for each combination of DIST and TERR. But product and MKT appear to be constants that you might want to keep in the output. , so I include them in the CLASS statement.
That is what PROC SUMMARY is for.
proc summary data=have nway;
class dist terr mkt product;
var tr nr;
output out=want sum(tr nr)=tr_sum nr_sum;
run;
You can also use proc summary to get other stats (N min max mean std ...... ).
By doing summing, there are still duplicates. I think it is not summing up, it is rolling up. For e.g.
dist terr relid mkt product TR NR
101 1 210 XI_U XIA 100 0
101 2 212 XI_U XIA 0 60
101 2 212 XI_U XIA 0 60
101 2 213 XI_U XIA 110 70
I need to have only one row per terr per product. So in the above example, it is having duplicate for Terr 2, so I should remove that duplicate and sum Relid 212 and 213.
Sorry for the confusion and thanks a lot for all of your comments.
Malathi
One thing: if you roll up like this then you will always have one row per distinct response. If you wish to include relid in with the rest of the responses then you will get one row per terr/relid, which may not be what you're looking for? If you just wanted one row per terr then you could not include relid, as such:
data have;
input dist terr relid mkt $ product $ TR NR;
datalines;
101 1 210 XI_U XIA 100 0
101 2 211 XI_U XIA 0 60
101 1 212 XI_U XIA 110 70
101 1 213 XI_U XIA 112 80
101 2 214 XI_U XIA 115 91
;
PROC SQL;
CREATE TABLE WANT AS
SELECT dist, TERR, MKT, SUM(TR) AS TOTAL_TR, SUM(NR) AS TOTAL_NR
FROM HAVE
GROUP BY dist, TERR, MKT ;
QUIT;
Alternatively, you could count the distinct relids per terr if you wanted.
You are right,
I'm not going to include relid and terr as well in the final output.
In that case, does the solution that I have above work for you?
So, product should not be included in teh select statement?
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.