Help using Base SAS procedures

Proc Sql calculating counts & percentages for groups and subgroups

Accepted Solution Solved
Reply
Contributor
Posts: 68
Accepted Solution

Proc Sql calculating counts & percentages for groups and subgroups

Hi all,

Any piece of suggestion would be more than welcomed.

I have a dataset with ID TranDate Product Region & SubRegion

IDTranDateProductRegionSubRegion
1002009-09-22FOTATSAT_1
1002010-05-02FOTATSAT_2
1002011-07-14FOTATSAT_3
1002007-07-06BIKOMSOM
1002009-03-09BIKOMSOM
2002009-09-22FOTKNSKN
2002010-05-02FOTLZSLZ_1
2002011-07-14FOTLZSLZ_2
3002007-07-06BIKOMSOM
3002009-03-09BIKATSAT

For the same ID and Product I have rows with single Region and SubRegion

and I also have rows for the same ID with common Region but different Subregion.

I would like to have one Proc Sql that will count:

- the total number of distinct ID's

- the total number of distinct ID that have multiple Regions

- the total number of distinct ID that have multiple Regions and multiple subregions

-  the percentages of the above against the total number of distinct IDs

Finally I would like, possibly in the same Proc Sql to replace all previous TranDates SubRegions with the latest Trandate subregion value, when an ID have multiple Subregions.

I would like to thank you in advance

Best Regards

Nikos.


Accepted Solutions
Solution
‎02-09-2012 11:01 AM
Respected Advisor
Posts: 3,124

Re: Proc Sql calculating counts & percentages for groups and subgroups

Then Try this one:

proc sql;

create table want (rename=(trandate_m=trandate subregion_m=subregion)) as

select a.id,

        count(distinct a.id) as ID_CT,

        (select count(distinct id)

         from (select id from have

         group by id having count(distinct region)>1))as id_ct_r,

       (select count (distinct id) from

         (select id, distinct region from have group by id, region having count(distinct subregion)>1)) as id_ct_sr,

calculated id_ct_r / calculated id_ct as percent_R format percent7.2,

calculated id_ct_sr / calculated id_ct as percent_SR format percent7.2,

b.trandate_m,

b.subregion_m

       from have a

left join (select id,  subregion as subregion_m, max(trandate) as trandate_m format yymmdd10.

            from have  group by id having trandate=calculated trandate_m) b on a.id=b.id

;

quit;

And plesae let us know how it works for you,

Regards,

Haikuo

View solution in original post


All Replies
Super Contributor
Posts: 1,636

Proc Sql calculating counts & percentages for groups and subgroups

Would you please give an example to show the difference between

- the total number of distinct ID that have multiple Regions

and

- the total number of distinct ID that have multiple Regions and multiple subregions?

Thanks!

Contributor
Posts: 68

Proc Sql calculating counts & percentages for groups and subgroups

I apologize

the following rows should be added to the above table

ID TranDateProductRegionSubRegion
4002010-05-07FOTATSAT
4002011-02-11FOTATSAT
5002008-11-17BIKLZSLZ
5002009-12-24BIKLZSLZ

Total number of distinct ID   = 5

Total no of distinct ID that have multiple regions = 3  (ID 100, 200, 300)

Total no of distinct ID that have multiple regions and multiple subregions = 2  (ID 100,200)

Percn:    3/5     and  2/5

Respected Advisor
Posts: 3,124

Re: Proc Sql calculating counts & percentages for groups and subgroups

Here just to start conversations:

data have;

infile cards;

input ID     TranDate :yymmdd10.     Product $     Region $     SubRegion $;

format trandate yymmdd10.;

cards;

100     2009-09-22     FOT     AT     SAT_1

100     2010-05-02     FOT     AT     SAT_2

100     2011-07-14     FOT     AT     SAT_3

100     2007-07-06     BIK     OM     SOM

100     2009-03-09     BIK     OM     SOM

200     2009-09-22     FOT     KN     SKN

200     2010-05-02     FOT     LZ     SLZ_1

200     2011-07-14     FOT     LZ     SLZ_2

300     2007-07-06     BIK     OM     SOM

300     2009-03-09     BIK     AT     SAT

400     2010-05-07     FOT     AT     SAT

400     2011-02-11     FOT     AT     SAT

500     2008-11-17     BIK     LZ     SLZ

500     2009-12-24     BIK     LZ     SLZ

;

proc sql;

create table want as

select a.id,

        count(distinct a.id) as ID_CT,

        (select count(distinct id)

         from (select id from have

         group by id having count(distinct region)>1))as id_ct_r,

        (select count(distinct id)

         from (select id from have

         group by id having count(distinct subregion)>1))as id_ct_sr,

calculated id_ct_r / calculated id_ct as percent_R format percent7.2,

calculated id_ct_sr / calculated id_ct as percent_SR format percent7.2,

trandate_m,

subregion_m

       from have a

left join (select id,  subregion as subregion_m, max(trandate) as trandate_M format yymmdd10.

            from have  group by id having trandate=calculated trandate_m) b on a.id=b.id

;

quit;

proc print;run;

I am also confused on your condition 2 ( - the total number of distinct ID that have multiple Regions) and 3 (- the total number of distinct ID that have multiple Regions and multiple subregions), aren't they logically  the same? Under the same ID, multiple regions will evitablely lead to multiple subregions, right? or you want to mean that each region has to have multiple subregion to be qualified?

Haikuo

Contributor
Posts: 68

Proc Sql calculating counts & percentages for groups and subgroups

There Regions like AT that have Multiple Subregions SAT_1, SAT_2 & SAT_3

while other Regions like OM have only one Subregion SOM

Thank you

Solution
‎02-09-2012 11:01 AM
Respected Advisor
Posts: 3,124

Re: Proc Sql calculating counts & percentages for groups and subgroups

Then Try this one:

proc sql;

create table want (rename=(trandate_m=trandate subregion_m=subregion)) as

select a.id,

        count(distinct a.id) as ID_CT,

        (select count(distinct id)

         from (select id from have

         group by id having count(distinct region)>1))as id_ct_r,

       (select count (distinct id) from

         (select id, distinct region from have group by id, region having count(distinct subregion)>1)) as id_ct_sr,

calculated id_ct_r / calculated id_ct as percent_R format percent7.2,

calculated id_ct_sr / calculated id_ct as percent_SR format percent7.2,

b.trandate_m,

b.subregion_m

       from have a

left join (select id,  subregion as subregion_m, max(trandate) as trandate_m format yymmdd10.

            from have  group by id having trandate=calculated trandate_m) b on a.id=b.id

;

quit;

And plesae let us know how it works for you,

Regards,

Haikuo

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 5 replies
  • 3198 views
  • 0 likes
  • 3 in conversation