my data has ID and stores where they put orders. I want to calculate how many stores one ID has put orders with, and have several steps code, and wonder if someone can help imporve it by making it shorter.
/* first, only select ID and Store columns to make data size smaller */
proc sql;
create table table1 as
select ID, store
from mydata;
quit;
/* count how many stores one ID has put orders with */
/* this step, in the output data, the same ID listed n times if n stores */
proc sql;
create table table2 as
select ID, store, count(*) as how_many_stores_per_ID
from table1
group by ID;
quit;
/* to remove duplicate rows for the same ID */
proc sql;
create table table3 as
select distinct ID, how_many_stores_per_ID
from table2;
quit;
/* to verify if the sum of all counts equal to the total rows of mydata */
proc means data=table3 n sum ;
var how_many_stores_per_ID;
output n=count sum=total1;
run;
I tested, they all worked, and the sum(how_many_stores_per_ID) equals total rows of mydata, which means it is right.
... View more