Hi all,
Any piece of suggestion would be more than welcomed.
I have a dataset with ID TranDate Product Region & SubRegion
ID | TranDate | Product | Region | SubRegion |
100 | 2009-09-22 | FOT | AT | SAT_1 |
100 | 2010-05-02 | FOT | AT | SAT_2 |
100 | 2011-07-14 | FOT | AT | SAT_3 |
100 | 2007-07-06 | BIK | OM | SOM |
100 | 2009-03-09 | BIK | OM | SOM |
200 | 2009-09-22 | FOT | KN | SKN |
200 | 2010-05-02 | FOT | LZ | SLZ_1 |
200 | 2011-07-14 | FOT | LZ | SLZ_2 |
300 | 2007-07-06 | BIK | OM | SOM |
300 | 2009-03-09 | BIK | AT | SAT |
For the same ID and Product I have rows with single Region and SubRegion
and I also have rows for the same ID with common Region but different Subregion.
I would like to have one Proc Sql that will count:
- the total number of distinct ID's
- the total number of distinct ID that have multiple Regions
- the total number of distinct ID that have multiple Regions and multiple subregions
- the percentages of the above against the total number of distinct IDs
Finally I would like, possibly in the same Proc Sql to replace all previous TranDates SubRegions with the latest Trandate subregion value, when an ID have multiple Subregions.
I would like to thank you in advance
Best Regards
Nikos.
Then Try this one:
proc sql;
create table want (rename=(trandate_m=trandate subregion_m=subregion)) as
select a.id,
count(distinct a.id) as ID_CT,
(select count(distinct id)
from (select id from have
group by id having count(distinct region)>1))as id_ct_r,
(select count (distinct id) from
(select id, distinct region from have group by id, region having count(distinct subregion)>1)) as id_ct_sr,
calculated id_ct_r / calculated id_ct as percent_R format percent7.2,
calculated id_ct_sr / calculated id_ct as percent_SR format percent7.2,
b.trandate_m,
b.subregion_m
from have a
left join (select id, subregion as subregion_m, max(trandate) as trandate_m format yymmdd10.
from have group by id having trandate=calculated trandate_m) b on a.id=b.id
;
quit;
And plesae let us know how it works for you,
Regards,
Haikuo
Would you please give an example to show the difference between
- the total number of distinct ID that have multiple Regions
and
- the total number of distinct ID that have multiple Regions and multiple subregions?
Thanks!
I apologize
the following rows should be added to the above table
Total number of distinct ID = 5 Total no of distinct ID that have multiple regions = 3 (ID 100, 200, 300) |
Total no of distinct ID that have multiple regions and multiple subregions = 2 (ID 100,200)
Percn: 3/5 and 2/5
Here just to start conversations:
data have;
infile cards;
input ID TranDate :yymmdd10. Product $ Region $ SubRegion $;
format trandate yymmdd10.;
cards;
100 2009-09-22 FOT AT SAT_1
100 2010-05-02 FOT AT SAT_2
100 2011-07-14 FOT AT SAT_3
100 2007-07-06 BIK OM SOM
100 2009-03-09 BIK OM SOM
200 2009-09-22 FOT KN SKN
200 2010-05-02 FOT LZ SLZ_1
200 2011-07-14 FOT LZ SLZ_2
300 2007-07-06 BIK OM SOM
300 2009-03-09 BIK AT SAT
400 2010-05-07 FOT AT SAT
400 2011-02-11 FOT AT SAT
500 2008-11-17 BIK LZ SLZ
500 2009-12-24 BIK LZ SLZ
;
proc sql;
create table want as
select a.id,
count(distinct a.id) as ID_CT,
(select count(distinct id)
from (select id from have
group by id having count(distinct region)>1))as id_ct_r,
(select count(distinct id)
from (select id from have
group by id having count(distinct subregion)>1))as id_ct_sr,
calculated id_ct_r / calculated id_ct as percent_R format percent7.2,
calculated id_ct_sr / calculated id_ct as percent_SR format percent7.2,
trandate_m,
subregion_m
from have a
left join (select id, subregion as subregion_m, max(trandate) as trandate_M format yymmdd10.
from have group by id having trandate=calculated trandate_m) b on a.id=b.id
;
quit;
proc print;run;
I am also confused on your condition 2 ( - the total number of distinct ID that have multiple Regions) and 3 (- the total number of distinct ID that have multiple Regions and multiple subregions), aren't they logically the same? Under the same ID, multiple regions will evitablely lead to multiple subregions, right? or you want to mean that each region has to have multiple subregion to be qualified?
Haikuo
There Regions like AT that have Multiple Subregions SAT_1, SAT_2 & SAT_3
while other Regions like OM have only one Subregion SOM
Thank you
Then Try this one:
proc sql;
create table want (rename=(trandate_m=trandate subregion_m=subregion)) as
select a.id,
count(distinct a.id) as ID_CT,
(select count(distinct id)
from (select id from have
group by id having count(distinct region)>1))as id_ct_r,
(select count (distinct id) from
(select id, distinct region from have group by id, region having count(distinct subregion)>1)) as id_ct_sr,
calculated id_ct_r / calculated id_ct as percent_R format percent7.2,
calculated id_ct_sr / calculated id_ct as percent_SR format percent7.2,
b.trandate_m,
b.subregion_m
from have a
left join (select id, subregion as subregion_m, max(trandate) as trandate_m format yymmdd10.
from have group by id having trandate=calculated trandate_m) b on a.id=b.id
;
quit;
And plesae let us know how it works for you,
Regards,
Haikuo
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.