10-30-2015 03:16 AM
I am trying to find the number of females who have entered the shop within the past month with the data I have. I have written the code but somehow it doesn't seem to work. Your help is greatly appreciated.
Let me describe my data. I have 2 datasets:
Data1: this data has the shop id and date and looks something like this
Data2: this data has the shop id, the date of customer visits, the customer id and the gender of customer (0 for male and 1 for female)
Shop_id Date Cust_id Gender
1 1/3/2015 113 1
1 1/4/2015 115 0
2 3/3/2015 116 1
I want to create a new dataset, data3, which has the shop_id, the date and the number of female customers who have visited the shop in the past 1 month (prior to the date). For example, in row 1, i want to insert a new column, no_female, which is the number of females who have visited the shop 1 from 23/2/2015 to 22/3/2015.
This is the code I write, but it doesn't seem to work:
Create table data3 as
Select a.*,sum(b.gender) as no_female
from data1 as a, data2 as b
where a.shop_id = b.shop_id and -30 <= intck('day',a.date,b.date) < 0
group by a.shop_id, a.date;
Your help is greatly appreciated!! Thanks!!!
10-30-2015 06:38 AM
Something like the below, note that it is not optimal coding, it is more to describe the thinking behind it:
data have1; informat date date9.; input Shop_id Date; datalines; 1 23MAR2015 1 01APR2015 2 01MAR2015 2 01MAY2015 ; run; data have2; informat date date9.; input Shop_id Date Cust_id Gender; datalines; 1 01MAR2015 113 1 1 01APR2015 115 0 2 03MAR2015 116 1 ; run; proc sql; create table WANT as select distinct SHOP_ID ,DATE ,count(CUST_ID) as RESULT from (select A.* ,B.CUST_ID from HAVE1 A left join (select * from HAVE2 where GENDER=1) B on A.SHOP_ID=B.SHOP_ID and intnx('month',A.DATE,-1) <= B.DATE <= A.DATE) group by SHOP_ID,DATE; quit;
Ok, so in the subquery, the from (), I join all female records back to the main dataset based on date being between date-1 month and date. Then in the main query I count these using a by group.