Solved: Re: how to do roll-ups and remove duplicates?

Malathi13 · Posted 05-01-2017 10:13 AM

Hi,

I have a table like:

dist	terr	relid	mkt	product	TR	NR
101	1	210	XI_U	XIA	100	0
101	2	211	XI_U	XIA	0	60
101	1	212	XI_U	XIA	110	70
101	1	213	XI_U	XIA	112	80
101	2	214	XI_U	XIA	115	91

I want to have the roll-ups of all the same terr and remove duplicates: For e.g. for dist 101 and terr 1 I should have only one row with TR=322 and NR=150. How do I remove duplicates and roll up the values as per the terr for the dist. In this case there should be only two rows (dist 101 and Terr1 and Dist 101 and terr2).

I tried to remove the duplicates using nodupkey in proc sort but that removes some of the products (not shown here).

Any help is greatly appreciated.

Thank you

M

thomp7050 · Posted 05-01-2017 01:04 PM

One thing: if you roll up like this then you will always have one row per distinct response. If you wish to include relid in with the rest of the responses then you will get one row per terr/relid, which may not be what you're looking for? If you just wanted one row per terr then you could not include relid, as such:

data have;
input dist terr relid mkt $ product $ TR NR;
datalines;
101 1 210 XI_U XIA 100 0 
101 2 211 XI_U XIA 0 60 
101 1 212 XI_U XIA 110 70 
101 1 213 XI_U XIA 112 80 
101 2 214 XI_U XIA 115 91 
;

PROC SQL;
CREATE TABLE WANT AS
SELECT dist, TERR,  MKT, SUM(TR) AS TOTAL_TR, SUM(NR) AS TOTAL_NR
FROM HAVE
GROUP BY dist, TERR, MKT ;
QUIT;

Alternatively, you could count the distinct relids per terr if you wanted.

View solution in original post

PeterClemmensen · Posted 05-01-2017 10:25 AM

like this?

data have;
input dist terr relid mkt $ product $ TR NR;
datalines;
101 1 210 XI_U XIA 100 0 
101 2 211 XI_U XIA 0 60 
101 1 212 XI_U XIA 110 70 
101 1 213 XI_U XIA 112 80 
101 2 214 XI_U XIA 115 91 
;

proc sort data = have;
	by dist terr;
run;

data want;
	set have;
	by dist terr;
	if first.dist and first.terr then do; 
		sum_TR=0;
		sum_NR=0;
	end;
	
	sum_TR = sum_TR + TR;
	sum_NR = sum_NR + NR;

	if last.terr then output;

	retain sum_TR sum_NR 0;
	drop TR NR;

run;

The DATA to DATA Step Macro
Blog: SASnrd

ballardw · Posted 05-01-2017 10:27 AM

It appears that you are SUMMING variables TR and NR. What values, if any of RELID, MKT or Product should be with the output?

Are you looking for a REPORT or a Dataset?

thomp7050 · Posted 05-01-2017 10:38 AM

Assuming what @ballardw is saying is true in that your purpose is to sum, and you are grouping by terr, then here is a relatively simple solution in proc sql (there are many ways do this but this is one):

data have;
input dist terr relid mkt $ product $ TR NR;
datalines;
101 1 210 XI_U XIA 100 0 
101 2 211 XI_U XIA 0 60 
101 1 212 XI_U XIA 110 70 
101 1 213 XI_U XIA 112 80 
101 2 214 XI_U XIA 115 91 
;

PROC SQL;
CREATE TABLE WANT AS
SELECT TERR, SUM(TR) AS TOTAL_TR, SUM(NR) AS TOTAL_NR
FROM HAVE
GROUP BY TERR;
QUIT;

Malathi13 · Posted 05-01-2017 10:52 AM

I'm trying to generate a report.

Malathi13 · Posted 05-01-2017 11:24 AM

My output should have dist, mkt, product, TR and NR. Don't need relid and Terr but the duplicates should be taken off by relid.

Thank you

M

mkeintz · Posted 05-01-2017 10:49 AM

By "rollup" you apparently meain you want to sum TR and NR for each combination of DIST and TERR. But product and MKT appear to be constants that you might want to keep in the output. , so I include them in the CLASS statement.

That is what PROC SUMMARY is for.

proc summary data=have nway;

class dist terr mkt product;

var tr nr;

output out=want sum(tr nr)=tr_sum nr_sum;

run;

You can also use proc summary to get other stats (N min max mean std ...... ).

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

Malathi13 · Posted 05-01-2017 11:21 AM

By doing summing, there are still duplicates. I think it is not summing up, it is rolling up. For e.g.

dist terr relid mkt product TR NR

101 1 210 XI_U XIA 100 0

101 2 212 XI_U XIA 0 60

101 2 213 XI_U XIA 110 70

I need to have only one row per terr per product. So in the above example, it is having duplicate for Terr 2, so I should remove that duplicate and sum Relid 212 and 213.

Sorry for the confusion and thanks a lot for all of your comments.

Malathi

thomp7050 · Posted 05-01-2017 01:04 PM

One thing: if you roll up like this then you will always have one row per distinct response. If you wish to include relid in with the rest of the responses then you will get one row per terr/relid, which may not be what you're looking for? If you just wanted one row per terr then you could not include relid, as such:

data have;
input dist terr relid mkt $ product $ TR NR;
datalines;
101 1 210 XI_U XIA 100 0 
101 2 211 XI_U XIA 0 60 
101 1 212 XI_U XIA 110 70 
101 1 213 XI_U XIA 112 80 
101 2 214 XI_U XIA 115 91 
;

PROC SQL;
CREATE TABLE WANT AS
SELECT dist, TERR,  MKT, SUM(TR) AS TOTAL_TR, SUM(NR) AS TOTAL_NR
FROM HAVE
GROUP BY dist, TERR, MKT ;
QUIT;

Alternatively, you could count the distinct relids per terr if you wanted.

Malathi13 · Posted 05-01-2017 01:23 PM

You are right,

I'm not going to include relid and terr as well in the final output.

thomp7050 · Posted 05-01-2017 01:39 PM

In that case, does the solution that I have above work for you?

Malathi13 · Posted 05-01-2017 01:42 PM

So, product should not be included in teh select statement?

Classroom Training Available!