Desktop productivity for business analysts and programmers

code issue

Reply
Occasional Contributor
Posts: 16

code issue

Hi All,

I'd like to ask if you can help about this. Thanks a lot!!!

data new;

input ID store_number visit_date yymmdd10.;

format visit_date yymmdd10.;

datalines;

1 21 20050615

1 21 20050708

1 21 20051023

1 28 20041203

2 10 20060123

2 60 20050709

2 19 20060825

3 25 20041010

3 91 20051201

3 64 20040609

4 13 20050606

5 26 20040809

5 26 20030911

;

run;

I’d like to get a new dataset to show that if a consumer has regular store to visit. That is, if he/she visited the same store(store number only between 10-29)more than 50% of his/her total store visits within 2 years of period, then he/she had regular store to visit. (But no including the person who only had one store to visit during the period, such as consumer #4). The new dataset should be like the following table.

Id

Regular_store_to_visit

(yes=1, no=0)

1

1

2

0

3

0

4

.

5

1

Respected Advisor
Posts: 3,124

Re: code issue

Ok, this is my try, it is really mouthful, and potentially slow,  but seems working on your data as is:

data new;

input ID store_number visit_date yymmdd10.;

format visit_date yymmdd10.;

datalines;

1 21 20050615

1 21 20050708

1 21 20051023

1 28 20041203

2 10 20060123

2 60 20050709

2 19 20060825

3 25 20041010

3 91 20051201

3 64 20040609

4 13 20050606

5 26 20040809

5 26 20030911

;

proc sql;

select a.id, case when missing(d) then .

                  else sum(case when sv/d>0.5 then 1 when sv/d<=0.5 then 0 else . end)>0

                  end  as regular_visit  from

  (select id, store_number, count(*) as sv from new group by id, store_number having

          intck('year',min(visit_date),max(visit_date))<=2) a

              left join

(select id,  count(*) as d from new group by id having count(*)>1) b

on a.id=b.id group by a.id;

quit;

Haikuo


Respected Advisor
Posts: 4,969

Re: code issue

user1,

This is a little bit of a shot in the dark, since you omitted some key requirements.  You haven't really told us about how to limit the data to 2 years per customer.  And you haven't told us whether store_numbers outside the range of 10-29 should contribute to the denominator of our percent calculations.

This program is easily adjustable, depending on what the assumptions ought to be.  So assuming you have already performed your subsetting based on the dates, this is a reasonable approach.

proc sort data=new out=have;

   by id;

   where (10 <= store_number <= 29);

run;

proc freq data=have order=freq;

   by id;

   tables store_number / noprint out=counts;

run;

data want;

   set counts;

   by id;
   if first.id;

   if last.id then regular_store=.;

   else regular_store = (percent > 50);

run;

Note that this also keeps other information you may find interesting ... the most frequently visited store number, the number of visits, and the percent.  Also note that customers with zero visits to the stores of interest will not appear in the final data set.

Good luck.

Occasional Contributor
Posts: 16

Re: code issue

Thanks very much for your help!

Best regards.

Ask a Question
Discussion stats
  • 3 replies
  • 212 views
  • 0 likes
  • 3 in conversation