DATA Step, Macro, Functions and more

please help simplify some codes

Accepted Solution Solved
Reply
Super Contributor
Posts: 318
Accepted Solution

please help simplify some codes

[ Edited ]

my data has ID and stores where they put orders. I want to calculate how many stores one ID has put orders with, and have several steps code, and wonder if someone can help imporve it by making it shorter.

/* first, only select ID and Store columns to make data size smaller */
proc sql;
create table table1 as
select  ID, store
from mydata;
quit;

/* count how many stores one ID has put orders with */
/* this step, in the output data, the same ID listed n times if n stores */
proc sql;
create table table2 as
select ID, store, count(*) as how_many_stores_per_ID
from table1
group by ID;
quit;

/* to remove duplicate rows for the same ID */
proc sql;
create table table3 as
select distinct ID, how_many_stores_per_ID
from table2;
quit;


/* to verify if the sum of all counts equal to the total rows of mydata */
proc means data=table3 n sum ;
var how_many_stores_per_ID;
output  n=count sum=total1;
run;

I tested, they all worked, and the sum(how_many_stores_per_ID) equals total rows of mydata, which means it is right.


Accepted Solutions
Solution
‎03-04-2016 10:22 AM
Super User
Posts: 5,427

Re: please help simplify some codes

Posted in reply to fengyuwuzu

One step should be enough:

select id, count(distinct store)
Data never sleeps

View solution in original post


All Replies
Super Contributor
Posts: 318

Re: please help simplify some codes

[ Edited ]
Posted in reply to fengyuwuzu

I was thinking to do like below, which gave different results. which one is correct, the above moethod or the one below? Many thanks.

 

proc sql;
create table table4 as
select ID, store, count(*) as how_many_stores_per_ID
from table1
group by ID, store;
quit;

 

Frequent Contributor
Posts: 130

Re: please help simplify some codes

Posted in reply to fengyuwuzu

You skipped the step where you removed duplicate rows for ID by grouping by ID and Store.  Also, you can do this pulling from your mydata dataset, eliminating the first step.  Try the code below:

 

proc sql;
create table table4 as
	select ID, count(store) as how_many_stores_per_ID
from mydata
group by ID;
quit;

 

 

 

Super Contributor
Posts: 318

Re: please help simplify some codes

I am confused with: should I use group by ID or
group by ID, store?
Frequent Contributor
Posts: 130

Re: please help simplify some codes

Posted in reply to fengyuwuzu

If you use group by ID, store then you will get duplicate ID's if they are related to more than one store.  In your original code, you grouped by ID, store; however, you then proceeded to eliminate duplicate IDs with your select distinct statement.  You can group just by ID, and that will give you a count of all the stores for each unique ID.

Super User
Posts: 19,789

Re: please help simplify some codes

Posted in reply to fengyuwuzu

Post sample data.

Test it manually with a small subset of data. You know what the answer should be, so you'll know which code is correct.

Super Contributor
Posts: 318

Re: please help simplify some codes

[ Edited ]
 
Solution
‎03-04-2016 10:22 AM
Super User
Posts: 5,427

Re: please help simplify some codes

Posted in reply to fengyuwuzu

One step should be enough:

select id, count(distinct store)
Data never sleeps
Super Contributor
Posts: 318

Re: please help simplify some codes

[ Edited ]

and use group by id?

Super User
Posts: 5,427

Re: please help simplify some codes

Posted in reply to fengyuwuzu

Yep

Data never sleeps
Super User
Posts: 5,427

Re: please help simplify some codes

Posted in reply to fengyuwuzu

Did the count(distinct...) work for you?

If yes, mark as solved.

Data never sleeps
Super Contributor
Posts: 318

Re: please help simplify some codes

yes, it is the simplest code.
Thanks
☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 11 replies
  • 368 views
  • 2 likes
  • 4 in conversation