Hi All,
I'd like to ask if you can help about this. Thanks a lot!!!
data new;
input ID store_number visit_date yymmdd10.;
format visit_date yymmdd10.;
datalines;
1 21 20050615
1 21 20050708
1 21 20051023
1 28 20041203
2 10 20060123
2 60 20050709
2 19 20060825
3 25 20041010
3 91 20051201
3 64 20040609
4 13 20050606
5 26 20040809
5 26 20030911
;
run;
I’d like to get a new dataset to show that if a consumer has regular store to visit. That is, if he/she visited the same store(store number only between 10-29)more than 50% of his/her total store visits within 2 years of period, then he/she had regular store to visit. (But no including the person who only had one store to visit during the period, such as consumer #4). The new dataset should be like the following table.
| Id | Regular_store_to_visit (yes=1, no=0) | 
| 1 | 1 | 
| 2 | 0 | 
| 3 | 0 | 
| 4 | . | 
| 5 | 1 | 
Ok, this is my try, it is really mouthful, and potentially slow, but seems working on your data as is:
data new;
input ID store_number visit_date yymmdd10.;
format visit_date yymmdd10.;
datalines;
1 21 20050615
1 21 20050708
1 21 20051023
1 28 20041203
2 10 20060123
2 60 20050709
2 19 20060825
3 25 20041010
3 91 20051201
3 64 20040609
4 13 20050606
5 26 20040809
5 26 20030911
;
proc sql;
select a.id, case when missing(d) then .
else sum(case when sv/d>0.5 then 1 when sv/d<=0.5 then 0 else . end)>0
end as regular_visit from
(select id, store_number, count(*) as sv from new group by id, store_number having
intck('year',min(visit_date),max(visit_date))<=2) a
left join
(select id, count(*) as d from new group by id having count(*)>1) b
on a.id=b.id group by a.id;
quit;
Haikuo
 
user1,
This is a little bit of a shot in the dark, since you omitted some key requirements. You haven't really told us about how to limit the data to 2 years per customer. And you haven't told us whether store_numbers outside the range of 10-29 should contribute to the denominator of our percent calculations.
This program is easily adjustable, depending on what the assumptions ought to be. So assuming you have already performed your subsetting based on the dates, this is a reasonable approach.
proc sort data=new out=have;
by id;
where (10 <= store_number <= 29);
run;
proc freq data=have order=freq;
by id;
tables store_number / noprint out=counts;
run;
data want;
set counts;
   by id;
   if first.id;
if last.id then regular_store=.;
else regular_store = (percent > 50);
run;
Note that this also keeps other information you may find interesting ... the most frequently visited store number, the number of visits, and the percent. Also note that customers with zero visits to the stores of interest will not appear in the final data set.
Good luck.
Thanks very much for your help!
Best regards.
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Check out this tutorial series to learn how to build your own steps in SAS Studio.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.
