BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Nikos
Fluorite | Level 6

Hi all,

Any piece of suggestion would be more than welcomed.

I have a dataset with ID TranDate Product Region & SubRegion

IDTranDateProductRegionSubRegion
1002009-09-22FOTATSAT_1
1002010-05-02FOTATSAT_2
1002011-07-14FOTATSAT_3
1002007-07-06BIKOMSOM
1002009-03-09BIKOMSOM
2002009-09-22FOTKNSKN
2002010-05-02FOTLZSLZ_1
2002011-07-14FOTLZSLZ_2
3002007-07-06BIKOMSOM
3002009-03-09BIKATSAT

For the same ID and Product I have rows with single Region and SubRegion

and I also have rows for the same ID with common Region but different Subregion.

I would like to have one Proc Sql that will count:

- the total number of distinct ID's

- the total number of distinct ID that have multiple Regions

- the total number of distinct ID that have multiple Regions and multiple subregions

-  the percentages of the above against the total number of distinct IDs

Finally I would like, possibly in the same Proc Sql to replace all previous TranDates SubRegions with the latest Trandate subregion value, when an ID have multiple Subregions.

I would like to thank you in advance

Best Regards

Nikos.

1 ACCEPTED SOLUTION

Accepted Solutions
Haikuo
Onyx | Level 15

Then Try this one:

proc sql;

create table want (rename=(trandate_m=trandate subregion_m=subregion)) as

select a.id,

        count(distinct a.id) as ID_CT,

        (select count(distinct id)

         from (select id from have

         group by id having count(distinct region)>1))as id_ct_r,

       (select count (distinct id) from

         (select id, distinct region from have group by id, region having count(distinct subregion)>1)) as id_ct_sr,

calculated id_ct_r / calculated id_ct as percent_R format percent7.2,

calculated id_ct_sr / calculated id_ct as percent_SR format percent7.2,

b.trandate_m,

b.subregion_m

       from have a

left join (select id,  subregion as subregion_m, max(trandate) as trandate_m format yymmdd10.

            from have  group by id having trandate=calculated trandate_m) b on a.id=b.id

;

quit;

And plesae let us know how it works for you,

Regards,

Haikuo

View solution in original post

5 REPLIES 5
Linlin
Lapis Lazuli | Level 10

Would you please give an example to show the difference between

- the total number of distinct ID that have multiple Regions

and

- the total number of distinct ID that have multiple Regions and multiple subregions?

Thanks!

Nikos
Fluorite | Level 6

I apologize

the following rows should be added to the above table

ID TranDateProductRegionSubRegion
4002010-05-07FOTATSAT
4002011-02-11FOTATSAT
5002008-11-17BIKLZSLZ
5002009-12-24BIKLZSLZ

Total number of distinct ID   = 5

Total no of distinct ID that have multiple regions = 3  (ID 100, 200, 300)

Total no of distinct ID that have multiple regions and multiple subregions = 2  (ID 100,200)

Percn:    3/5     and  2/5

Haikuo
Onyx | Level 15

Here just to start conversations:

data have;

infile cards;

input ID     TranDate :yymmdd10.     Product $     Region $     SubRegion $;

format trandate yymmdd10.;

cards;

100     2009-09-22     FOT     AT     SAT_1

100     2010-05-02     FOT     AT     SAT_2

100     2011-07-14     FOT     AT     SAT_3

100     2007-07-06     BIK     OM     SOM

100     2009-03-09     BIK     OM     SOM

200     2009-09-22     FOT     KN     SKN

200     2010-05-02     FOT     LZ     SLZ_1

200     2011-07-14     FOT     LZ     SLZ_2

300     2007-07-06     BIK     OM     SOM

300     2009-03-09     BIK     AT     SAT

400     2010-05-07     FOT     AT     SAT

400     2011-02-11     FOT     AT     SAT

500     2008-11-17     BIK     LZ     SLZ

500     2009-12-24     BIK     LZ     SLZ

;

proc sql;

create table want as

select a.id,

        count(distinct a.id) as ID_CT,

        (select count(distinct id)

         from (select id from have

         group by id having count(distinct region)>1))as id_ct_r,

        (select count(distinct id)

         from (select id from have

         group by id having count(distinct subregion)>1))as id_ct_sr,

calculated id_ct_r / calculated id_ct as percent_R format percent7.2,

calculated id_ct_sr / calculated id_ct as percent_SR format percent7.2,

trandate_m,

subregion_m

       from have a

left join (select id,  subregion as subregion_m, max(trandate) as trandate_M format yymmdd10.

            from have  group by id having trandate=calculated trandate_m) b on a.id=b.id

;

quit;

proc print;run;

I am also confused on your condition 2 ( - the total number of distinct ID that have multiple Regions) and 3 (- the total number of distinct ID that have multiple Regions and multiple subregions), aren't they logically  the same? Under the same ID, multiple regions will evitablely lead to multiple subregions, right? or you want to mean that each region has to have multiple subregion to be qualified?

Haikuo

Nikos
Fluorite | Level 6

There Regions like AT that have Multiple Subregions SAT_1, SAT_2 & SAT_3

while other Regions like OM have only one Subregion SOM

Thank you

Haikuo
Onyx | Level 15

Then Try this one:

proc sql;

create table want (rename=(trandate_m=trandate subregion_m=subregion)) as

select a.id,

        count(distinct a.id) as ID_CT,

        (select count(distinct id)

         from (select id from have

         group by id having count(distinct region)>1))as id_ct_r,

       (select count (distinct id) from

         (select id, distinct region from have group by id, region having count(distinct subregion)>1)) as id_ct_sr,

calculated id_ct_r / calculated id_ct as percent_R format percent7.2,

calculated id_ct_sr / calculated id_ct as percent_SR format percent7.2,

b.trandate_m,

b.subregion_m

       from have a

left join (select id,  subregion as subregion_m, max(trandate) as trandate_m format yymmdd10.

            from have  group by id having trandate=calculated trandate_m) b on a.id=b.id

;

quit;

And plesae let us know how it works for you,

Regards,

Haikuo

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 8707 views
  • 1 like
  • 3 in conversation