how to do roll-ups and remove duplicates?

Solved
Frequent Contributor
Posts: 76

how to do roll-ups and remove duplicates?

Hi,

I have a table like:

 dist terr relid mkt product TR NR 101 1 210 XI_U XIA 100 0 101 2 211 XI_U XIA 0 60 101 1 212 XI_U XIA 110 70 101 1 213 XI_U XIA 112 80 101 2 214 XI_U XIA 115 91

I want to have the roll-ups of all the same terr and remove duplicates: For e.g. for dist 101 and terr 1 I should have only one row with TR=322 and NR=150. How do I remove duplicates and roll up the values as per the terr for the dist. In this case there should be only two rows (dist 101 and Terr1 and Dist 101 and terr2).

I tried to remove the duplicates using nodupkey in proc sort but that removes some of the products (not shown here).

Any help is greatly appreciated.

Thank you

M

Accepted Solutions
Solution
‎05-01-2017 01:53 PM
Frequent Contributor
Posts: 93

Re: how to do roll-ups and remove duplicates?

One thing: if you roll up like this then you will always have one row per distinct response.  If you wish to include relid in with the rest of the responses then you will get one row per terr/relid, which may not be what you're looking for?  If you just wanted one row per terr then you could not include relid, as such:

``````data have;
input dist terr relid mkt \$ product \$ TR NR;
datalines;
101 1 210 XI_U XIA 100 0
101 2 211 XI_U XIA 0 60
101 1 212 XI_U XIA 110 70
101 1 213 XI_U XIA 112 80
101 2 214 XI_U XIA 115 91
;

PROC SQL;
CREATE TABLE WANT AS
SELECT dist, TERR,  MKT, SUM(TR) AS TOTAL_TR, SUM(NR) AS TOTAL_NR
FROM HAVE
GROUP BY dist, TERR, MKT ;
QUIT;``````

Alternatively, you could count the distinct relids per terr if you wanted.

All Replies
PROC Star
Posts: 1,283

Re: how to do roll-ups and remove duplicates?

like this?

``````data have;
input dist terr relid mkt \$ product \$ TR NR;
datalines;
101 1 210 XI_U XIA 100 0
101 2 211 XI_U XIA 0 60
101 1 212 XI_U XIA 110 70
101 1 213 XI_U XIA 112 80
101 2 214 XI_U XIA 115 91
;

proc sort data = have;
by dist terr;
run;

data want;
set have;
by dist terr;
if first.dist and first.terr then do;
sum_TR=0;
sum_NR=0;
end;

sum_TR = sum_TR + TR;
sum_NR = sum_NR + NR;

if last.terr then output;

retain sum_TR sum_NR 0;
drop TR NR;

run;

``````
Super User
Posts: 13,528

Re: how to do roll-ups and remove duplicates?

It appears that you are SUMMING variables TR and NR. What values, if any of RELID, MKT or Product should be with the output?

Are you looking for a REPORT or a Dataset?

Frequent Contributor
Posts: 93

Re: how to do roll-ups and remove duplicates?

Assuming what @ballardw is saying is true in that your purpose is to sum, and you are grouping by terr, then here is a relatively simple solution in proc sql (there are many ways do this but this is one):

``````data have;
input dist terr relid mkt \$ product \$ TR NR;
datalines;
101 1 210 XI_U XIA 100 0
101 2 211 XI_U XIA 0 60
101 1 212 XI_U XIA 110 70
101 1 213 XI_U XIA 112 80
101 2 214 XI_U XIA 115 91
;

PROC SQL;
CREATE TABLE WANT AS
SELECT TERR, SUM(TR) AS TOTAL_TR, SUM(NR) AS TOTAL_NR
FROM HAVE
GROUP BY TERR;
QUIT;``````
Frequent Contributor
Posts: 76

Re: how to do roll-ups and remove duplicates?

I'm trying to generate a report.

Frequent Contributor
Posts: 76

Re: how to do roll-ups and remove duplicates?

My output should have dist, mkt, product, TR and NR. Don't need relid and Terr but the duplicates should be taken off by relid.

Thank you

M

Posts: 1,337

Re: how to do roll-ups and remove duplicates?

By "rollup" you apparently meain you want to sum TR and NR for each combination of DIST and TERR.  But product and MKT appear to be constants that you might want to keep in the output. , so I include them in the CLASS statement.

That is what PROC SUMMARY is for.

proc summary data=have nway;

class dist terr mkt product;

var tr nr;

output out=want   sum(tr nr)=tr_sum nr_sum;

run;

You can also use proc summary to get other stats (N min max mean std ...... ).

Frequent Contributor
Posts: 76

Re: how to do roll-ups and remove duplicates?

By doing summing, there are still duplicates. I think it is not summing up, it is rolling up. For e.g.

dist       terr          relid    mkt   product   TR     NR

101      1             210       XI_U    XIA     100     0

101       2            212       XI_U   XIA      0        60

101       2            212       XI_U   XIA      0        60

101       2            213       XI_U   XIA     110     70

I need to have only one row per terr per product. So in the above example, it is having duplicate for Terr 2, so I should remove that duplicate and sum Relid 212 and 213.

Sorry for the confusion and thanks a lot for all of your comments.

Malathi

Solution
‎05-01-2017 01:53 PM
Frequent Contributor
Posts: 93

Re: how to do roll-ups and remove duplicates?

One thing: if you roll up like this then you will always have one row per distinct response.  If you wish to include relid in with the rest of the responses then you will get one row per terr/relid, which may not be what you're looking for?  If you just wanted one row per terr then you could not include relid, as such:

``````data have;
input dist terr relid mkt \$ product \$ TR NR;
datalines;
101 1 210 XI_U XIA 100 0
101 2 211 XI_U XIA 0 60
101 1 212 XI_U XIA 110 70
101 1 213 XI_U XIA 112 80
101 2 214 XI_U XIA 115 91
;

PROC SQL;
CREATE TABLE WANT AS
SELECT dist, TERR,  MKT, SUM(TR) AS TOTAL_TR, SUM(NR) AS TOTAL_NR
FROM HAVE
GROUP BY dist, TERR, MKT ;
QUIT;``````

Alternatively, you could count the distinct relids per terr if you wanted.

Frequent Contributor
Posts: 76

Re: how to do roll-ups and remove duplicates?

You are right,

I'm not going to include relid and terr as well in the final output.

Frequent Contributor
Posts: 93

Re: how to do roll-ups and remove duplicates?

In that case, does the solution that I have above work for you?

Frequent Contributor
Posts: 76