Re: Creating cumulative counts over period of time

munitech4u · Posted 03-26-2015 07:13 AM

Hi I have a dataset like this:

Month Geo_id

1 A

1 B

1 C

2 A

2 D

2 E

I want to select cumulative counts like:

Month Count

1 3

2 5

Where 3 is the count of distinct geo_ids for 1st month, 5 is the count of distinct geo_ids for month 1,2 and so on for other months.

I tried this unsuccessfully:

proc sql;

create table temp as select month,count(distinct geo_id) as cnt from active_phy1 where min(month) ge month le month

group by month;quit;

naveen_srini · Posted 03-26-2015 07:39 AM

data have;
input (month geo_id) ($);
datalines;
1           A
1           B
1           C
2           A
2           D
2           E
;

proc sort data=have out=have1 nodupkey;
by geo_id;
run;

data want;
do until(last.month);
set have1;
by month;
count+1;
end;
drop geo_id;
run;

Regards,

Naveen Srinivasan

L&T Infotech

munitech4u · Posted 03-26-2015 07:58 AM

Thanks,

Anyway we can do this by sql?

naveen_srini · Posted 03-26-2015 08:16 AM

has given you the sql answer, I hope you noted. Thanks

shubhayog · Posted 03-26-2015 08:41 AM

Hi,

Your code will not work I guess in case of dataset like

data have;

input month geo_id$;

datalines;

1 A

1 B

1 C

2 A

2 D

2 E

3 A

3 C

3 D

;

run;

Yogesh

RW9 · Posted 03-26-2015 07:44 AM

Hi,

Two ways jump to mind, first ave a sub-query do the count:

proc sql;

create table WANT as

select A.MONTH,

(select count(distinct GEO_ID) from HAVE where MONTH <= A.MONTH) as CNT

from (select distinct MONTH from HAVE) A;

quit;

Second, proc sort nodupkey before the SQL:

proc sort data=have nodupkey;

by geo_id;

run;

proc sql;

create table WANT as

select MONTH,

count(DISTINCT GEO_ID) as CNT

from HAVE

group by MONTH;

quit;

Ksharp · Posted 03-26-2015 09:03 AM

It is not good for SQL , low efficient .

data have;
input (month geo_id) ($);
datalines;
1           A
1           B
1           C
2           A
2           D
2           E
;
run;

proc sql;
 create table want as
  select month,(select count(distinct geo_id) from have where month le a.month )  as count
   from (select distinct month from have) as a ;
run;

It is good for data step.Fast .

data want1;
 set have;
 by month;
 array x{999999} $ _temporary_ ;
 if geo_id not in x then do;n+1;x{n}=geo_id;end;
 if last.month;
 drop geo_id;
run;

Xia Keshan

Haikuo · Posted 03-26-2015 09:39 AM

I like the full dynamic Hash approach better, no waste of memory, no worry about busted array.

data have;

input (month geo_id) ($);

datalines;

1 A

1 B

1 C

2 A

2 D

2 E

;

run;

data want;

if _n_=1 then

do;

declare hash h();

h.definekey('geo_id');

h.definedone();

end;

do until (last.month);

set have;

by month;

rc=h.add();

end;

count=h.num_items;

keep month count;

run;

Loko · Posted 03-26-2015 10:15 AM

hello,

data have;
input Month Geo_id $;
datalines;
1 A
1 B
1 C
2 A
2 D
2 E
;

proc sort data=have;
by month;
run;

data want;

if _N_=1 then do;
declare hash g(ordered:'Y');
g.definekey('gr');
g.definedone();
end;

set have;
by month;

gr=Geo_id;

if g.find(key:Geo_id) ne 0 then do;g.add();end;

if last.month then do;totalitems = g.num_items;output;end;

keep month totalitems;
run;

Ready to join fellow brilliant minds for the SAS Hackathon?

Classroom Training Available!