Solved: Re: Proc Sql calculating counts & percentages for groups and subgroups

Nikos · Posted 02-09-2012 09:20 AM

Hi all,

Any piece of suggestion would be more than welcomed.

I have a dataset with ID TranDate Product Region & SubRegion

ID	TranDate	Product	Region	SubRegion
100	2009-09-22	FOT	AT	SAT_1
100	2010-05-02	FOT	AT	SAT_2
100	2011-07-14	FOT	AT	SAT_3
100	2007-07-06	BIK	OM	SOM
100	2009-03-09	BIK	OM	SOM
200	2009-09-22	FOT	KN	SKN
200	2010-05-02	FOT	LZ	SLZ_1
200	2011-07-14	FOT	LZ	SLZ_2
300	2007-07-06	BIK	OM	SOM
300	2009-03-09	BIK	AT	SAT

For the same ID and Product I have rows with single Region and SubRegion

and I also have rows for the same ID with common Region but different Subregion.

I would like to have one Proc Sql that will count:

- the total number of distinct ID's

- the total number of distinct ID that have multiple Regions

- the total number of distinct ID that have multiple Regions and multiple subregions

- the percentages of the above against the total number of distinct IDs

Finally I would like, possibly in the same Proc Sql to replace all previous TranDates SubRegions with the latest Trandate subregion value, when an ID have multiple Subregions.

I would like to thank you in advance

Best Regards

Nikos.

Haikuo · Posted 02-09-2012 11:01 AM

Then Try this one:

proc sql;

create table want (rename=(trandate_m=trandate subregion_m=subregion)) as

select a.id,

count(distinct a.id) as ID_CT,

(select count(distinct id)

from (select id from have

group by id having count(distinct region)>1))as id_ct_r,

(select count (distinct id) from

(select id, distinct region from have group by id, region having count(distinct subregion)>1)) as id_ct_sr,

calculated id_ct_r / calculated id_ct as percent_R format percent7.2,

calculated id_ct_sr / calculated id_ct as percent_SR format percent7.2,

b.trandate_m,

b.subregion_m

from have a

left join (select id, subregion as subregion_m, max(trandate) as trandate_m format yymmdd10.

from have group by id having trandate=calculated trandate_m) b on a.id=b.id

;

quit;

And plesae let us know how it works for you,

Regards,

Haikuo

View solution in original post

Linlin · Posted 02-09-2012 09:59 AM

Would you please give an example to show the difference between

- the total number of distinct ID that have multiple Regions

and

- the total number of distinct ID that have multiple Regions and multiple subregions?

Thanks!

Nikos · Posted 02-09-2012 10:26 AM

I apologize

the following rows should be added to the above table

ID	TranDate	Product	Region	SubRegion
400	2010-05-07	FOT	AT	SAT
400	2011-02-11	FOT	AT	SAT
500	2008-11-17	BIK	LZ	SLZ
500	2009-12-24	BIK	LZ	SLZ

Total number of distinct ID = 5

Total no of distinct ID that have multiple regions = 3 (ID 100, 200, 300)

Total no of distinct ID that have multiple regions and multiple subregions = 2 (ID 100,200)

Percn: 3/5 and 2/5

Haikuo · Posted 02-09-2012 10:34 AM

Here just to start conversations:

data have;

infile cards;

input ID TranDate :yymmdd10. Product $ Region $ SubRegion $;

format trandate yymmdd10.;

cards;

100 2009-09-22 FOT AT SAT_1

100 2010-05-02 FOT AT SAT_2

100 2011-07-14 FOT AT SAT_3

100 2007-07-06 BIK OM SOM

100 2009-03-09 BIK OM SOM

200 2009-09-22 FOT KN SKN

200 2010-05-02 FOT LZ SLZ_1

200 2011-07-14 FOT LZ SLZ_2

300 2007-07-06 BIK OM SOM

300 2009-03-09 BIK AT SAT

400 2010-05-07 FOT AT SAT

400 2011-02-11 FOT AT SAT

500 2008-11-17 BIK LZ SLZ

500 2009-12-24 BIK LZ SLZ

;

proc sql;

create table want as

select a.id,

count(distinct a.id) as ID_CT,

(select count(distinct id)

from (select id from have

group by id having count(distinct region)>1))as id_ct_r,

(select count(distinct id)

from (select id from have

group by id having count(distinct subregion)>1))as id_ct_sr,

calculated id_ct_r / calculated id_ct as percent_R format percent7.2,

calculated id_ct_sr / calculated id_ct as percent_SR format percent7.2,

trandate_m,

subregion_m

from have a

left join (select id, subregion as subregion_m, max(trandate) as trandate_M format yymmdd10.

from have group by id having trandate=calculated trandate_m) b on a.id=b.id

;

quit;

proc print;run;

I am also confused on your condition 2 ( - the total number of distinct ID that have multiple Regions) and 3 (- the total number of distinct ID that have multiple Regions and multiple subregions), aren't they logically the same? Under the same ID, multiple regions will evitablely lead to multiple subregions, right? or you want to mean that each region has to have multiple subregion to be qualified?

Haikuo

Nikos · Posted 02-09-2012 10:44 AM

There Regions like AT that have Multiple Subregions SAT_1, SAT_2 & SAT_3

while other Regions like OM have only one Subregion SOM

Thank you

Haikuo · Posted 02-09-2012 11:01 AM

Then Try this one:

proc sql;

create table want (rename=(trandate_m=trandate subregion_m=subregion)) as

select a.id,

count(distinct a.id) as ID_CT,

(select count(distinct id)

from (select id from have

group by id having count(distinct region)>1))as id_ct_r,

(select count (distinct id) from

(select id, distinct region from have group by id, region having count(distinct subregion)>1)) as id_ct_sr,

calculated id_ct_r / calculated id_ct as percent_R format percent7.2,

calculated id_ct_sr / calculated id_ct as percent_SR format percent7.2,

b.trandate_m,

b.subregion_m

from have a

left join (select id, subregion as subregion_m, max(trandate) as trandate_m format yymmdd10.

from have group by id having trandate=calculated trandate_m) b on a.id=b.id

;

quit;

And plesae let us know how it works for you,

Regards,

Haikuo

SAS Innovate 2025: Call for Content

Classroom Training Available!