Hi I have a dataset like this:
Month Geo_id
1 A
1 B
1 C
2 A
2 D
2 E
I want to select cumulative counts like:
Month Count
1 3
2 5
Where 3 is the count of distinct geo_ids for 1st month, 5 is the count of distinct geo_ids for month 1,2 and so on for other months.
I tried this unsuccessfully:
proc sql;
create table temp as select month,count(distinct geo_id) as cnt from active_phy1 where min(month) ge month le month
group by month;quit;
data have;
input (month geo_id) ($);
datalines;
1 A
1 B
1 C
2 A
2 D
2 E
;
proc sort data=have out=have1 nodupkey;
by geo_id;
run;
data want;
do until(last.month);
set have1;
by month;
count+1;
end;
drop geo_id;
run;
Regards,
Naveen Srinivasan
L&T Infotech
Thanks,
Anyway we can do this by sql?
Hi,
Your code will not work I guess in case of dataset like
data have;
input month geo_id$;
datalines;
1 A
1 B
1 C
2 A
2 D
2 E
3 A
3 C
3 D
;
run;
Yogesh
Hi,
Two ways jump to mind, first ave a sub-query do the count:
proc sql;
create table WANT as
select A.MONTH,
(select count(distinct GEO_ID) from HAVE where MONTH <= A.MONTH) as CNT
from (select distinct MONTH from HAVE) A;
quit;
Second, proc sort nodupkey before the SQL:
proc sort data=have nodupkey;
by geo_id;
run;
proc sql;
create table WANT as
select MONTH,
count(DISTINCT GEO_ID) as CNT
from HAVE
group by MONTH;
quit;
It is not good for SQL , low efficient .
data have; input (month geo_id) ($); datalines; 1 A 1 B 1 C 2 A 2 D 2 E ; run; proc sql; create table want as select month,(select count(distinct geo_id) from have where month le a.month ) as count from (select distinct month from have) as a ; run;
It is good for data step.Fast .
data want1; set have; by month; array x{999999} $ _temporary_ ; if geo_id not in x then do;n+1;x{n}=geo_id;end; if last.month; drop geo_id; run;
Xia Keshan
I like the full dynamic Hash approach better, no waste of memory, no worry about busted array.
data have;
input (month geo_id) ($);
datalines;
1 A
1 B
1 C
2 A
2 D
2 E
;
run;
data want;
if _n_=1 then
do;
declare hash h();
h.definekey('geo_id');
h.definedone();
end;
do until (last.month);
set have;
by month;
rc=h.add();
end;
count=h.num_items;
keep month count;
run;
hello,
data have;
input Month Geo_id $;
datalines;
1 A
1 B
1 C
2 A
2 D
2 E
;
proc sort data=have;
by month;
run;
data want;
if _N_=1 then do;
declare hash g(ordered:'Y');
g.definekey('gr');
g.definedone();
end;
set have;
by month;
gr=Geo_id;
if g.find(key:Geo_id) ne 0 then do;g.add();end;
if last.month then do;totalitems = g.num_items;output;end;
keep month totalitems;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.