Hi I have a dataset like this:
Month Geo_id
1 A
1 B
1 C
2 A
2 D
2 E
I want to select cumulative counts like:
Month Count
1 3
2 5
Where 3 is the count of distinct geo_ids for 1st month, 5 is the count of distinct geo_ids for month 1,2 and so on for other months.
I tried this unsuccessfully:
proc sql;
create table temp as select month,count(distinct geo_id) as cnt from active_phy1 where min(month) ge month le month
group by month;quit;
data have;
input (month geo_id) ($);
datalines;
1 A
1 B
1 C
2 A
2 D
2 E
;
proc sort data=have out=have1 nodupkey;
by geo_id;
run;
data want;
do until(last.month);
set have1;
by month;
count+1;
end;
drop geo_id;
run;
Regards,
Naveen Srinivasan
L&T Infotech
Thanks,
Anyway we can do this by sql?
Hi,
Your code will not work I guess in case of dataset like
data have;
input month geo_id$;
datalines;
1 A
1 B
1 C
2 A
2 D
2 E
3 A
3 C
3 D
;
run;
Yogesh
Hi,
Two ways jump to mind, first ave a sub-query do the count:
proc sql;
create table WANT as
select A.MONTH,
(select count(distinct GEO_ID) from HAVE where MONTH <= A.MONTH) as CNT
from (select distinct MONTH from HAVE) A;
quit;
Second, proc sort nodupkey before the SQL:
proc sort data=have nodupkey;
by geo_id;
run;
proc sql;
create table WANT as
select MONTH,
count(DISTINCT GEO_ID) as CNT
from HAVE
group by MONTH;
quit;
It is not good for SQL , low efficient .
data have; input (month geo_id) ($); datalines; 1 A 1 B 1 C 2 A 2 D 2 E ; run; proc sql; create table want as select month,(select count(distinct geo_id) from have where month le a.month ) as count from (select distinct month from have) as a ; run;
It is good for data step.Fast .
data want1; set have; by month; array x{999999} $ _temporary_ ; if geo_id not in x then do;n+1;x{n}=geo_id;end; if last.month; drop geo_id; run;
Xia Keshan
I like the full dynamic Hash approach better, no waste of memory, no worry about busted array.
data have;
input (month geo_id) ($);
datalines;
1 A
1 B
1 C
2 A
2 D
2 E
;
run;
data want;
if _n_=1 then
do;
declare hash h();
h.definekey('geo_id');
h.definedone();
end;
do until (last.month);
set have;
by month;
rc=h.add();
end;
count=h.num_items;
keep month count;
run;
hello,
data have;
input Month Geo_id $;
datalines;
1 A
1 B
1 C
2 A
2 D
2 E
;
proc sort data=have;
by month;
run;
data want;
if _N_=1 then do;
declare hash g(ordered:'Y');
g.definekey('gr');
g.definedone();
end;
set have;
by month;
gr=Geo_id;
if g.find(key:Geo_id) ne 0 then do;g.add();end;
if last.month then do;totalitems = g.num_items;output;end;
keep month totalitems;
run;
Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.
Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.