Programming the statistical procedures from SAS

choosing entries

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 7
Accepted Solution

choosing entries

I'm launching a marketing campaign in which I grant every client that reached a transaction sum of 100$ on his credit card a free gift card,

The problem is that I need to choose the first 100 clients that reached this threshold, my data table include:

- client id

- transaction date

- transaction amount

Thank you in advance


Accepted Solutions
Solution
‎12-27-2014 04:15 AM
Super Contributor
Posts: 490

Re: choosing entries

data client;

      input ID @3 Date date7. @12 Amount;

   datalines;

1  13may14 85

2  17oct14 120

3  03feb14 160

4  28feb14 80

5  10nov14 77

6  25apr14 1

7  03jun14 140

8  14jan14 900

9  24oct14 1245

10 27aug14 78

1  1may14 85

2  7oct14 120

3  13feb14 160

4  8feb14 5

5  1nov14 976

6  5apr14 88

7  23jun14 140

8  24jan14 900

9  14oct14 1245

10 2oct14 78

;

proc sort data=client;

by id date;

run;

data test;

set client;

retain sum 0;

retain f 0;

by id;

sum= sum + amount;

if first.id then do; sum=amount; f=0; end;

if (sum>100 and f = 0) then do; f=1;output; end;

else f= 0;

run;

proc sort data=test;

by date;

run;

data wanted (keep= ID Date Sum);

set test (obs=5);

run;

View solution in original post


All Replies
Super Contributor
Posts: 490

Re: choosing entries

1- Sort your data

2- Use OBS option with Where condition in Proc Print

Example:

data client;

      input ID @3 Date date7. @12 Amount;

   datalines;

1  13may14 85

2  17oct14 120

3  03feb14 160

4  28feb14 80

5  10nov14 976

6  25apr14 88

7  03jun14 140

8  14jan14 900

9  24oct14 1245

10 27aug14 78

;

proc sort data=client;

by Date;

run;

proc print data=Client (obs= 5);

where amount > 100;

run;

Occasional Contributor
Posts: 7

Re: choosing entries

Thank you but maybe my question was not very clear, the data is as follows:

Client id           date                   amount

10001         20141215                 50

10002         20141215                 30

10003         20141215                 90

10001         20141216                 70

10002         20141216                 80

10003         20141216                 5

10003         20141217                 100

So if I have to choose 2 between these 3 clients I would be choosing 10001 and 10002, now consider this situation but with 100000 clients

Solution
‎12-27-2014 04:15 AM
Super Contributor
Posts: 490

Re: choosing entries

data client;

      input ID @3 Date date7. @12 Amount;

   datalines;

1  13may14 85

2  17oct14 120

3  03feb14 160

4  28feb14 80

5  10nov14 77

6  25apr14 1

7  03jun14 140

8  14jan14 900

9  24oct14 1245

10 27aug14 78

1  1may14 85

2  7oct14 120

3  13feb14 160

4  8feb14 5

5  1nov14 976

6  5apr14 88

7  23jun14 140

8  24jan14 900

9  14oct14 1245

10 2oct14 78

;

proc sort data=client;

by id date;

run;

data test;

set client;

retain sum 0;

retain f 0;

by id;

sum= sum + amount;

if first.id then do; sum=amount; f=0; end;

if (sum>100 and f = 0) then do; f=1;output; end;

else f= 0;

run;

proc sort data=test;

by date;

run;

data wanted (keep= ID Date Sum);

set test (obs=5);

run;

Occasional Contributor
Posts: 7

Re: choosing entries

Thank you

Super User
Posts: 9,775

Re: choosing entries

If I understood what you mean .

data client;

input Client_id           date     : yymmdd10.              amount ;

format date yymmddn8.;

cards;

10001         20141215                 50

10002         20141215                 30

10003         20141215                 90

10001         20141216                 70

10002         20141216                 80

10003         20141216                 5

10003         20141217                 100

;

proc sort data=client;

by Client_id           date ;

run;

%let n=2;

data want;

set client;

by Client_id;

retain found 0 ;

if first.Client_id then do;sum=0; found=0;end;

sum+amount;

if sum ge 100 and not found then do;found=1;n+1;output;end;

if n eq &n then stop;

run;

Xia Keshan

Message was edited by: xia keshan

Occasional Contributor
Posts: 7

Re: choosing entries

Hello xia keshan and thank you for your answer, but what I need in my output table is 100 unique client id's this method displayed the same id many times with different dates.

Super User
Posts: 9,775

Re: choosing entries

Sorry my bad. Code updated .

Respected Advisor
Posts: 3,147

Re: choosing entries

This may be a Proc SQL alternative that does not require to presort the data. However, it takes toll on the efficiency for other reasons.

data client;

     input ID @3 Date date7. @12 Amount;

     datalines;

1 13may14 85

2 17oct14 120

3 03feb14 160

4 28feb14 80

5 10nov14 77

6 25apr14 1

7 03jun14 140

8 14jan14 900

9 24oct14 1245

10 27aug14 78

1 1may14 85

2 7oct14 120

3 13feb14 160

4 8feb14 5

5 1nov14 976

6 5apr14 88

7 23jun14 140

8 24jan14 900

9 14oct14 1245

10 2oct14 78

;

proc sql outobs=3;

     create table want as

           select id,

                case

                     when (select sum(amount) from client where id=a.id and date <= a.date)>=100 then date

                     else .

                end

           as _date format=date9.

                from client a

                     group by id

                           having _date=min(_date) and not missing (_date)

                                order by _date

     ;

quit;

Trusted Advisor
Posts: 1,222

Re: choosing entries

Although, Hai.kuo has suggested a sql solution. My try using proc sql:

proc sql outobs=2;

create table want as

select client_id from client

group by client_id

having date=max(date) and sum(amount)>=100

order by date;

quit;

Respected Advisor
Posts: 3,147

Re: choosing entries

Hmm. You don't have a chronological filtering mechanism (order by date is for the final output, doesn't count) in your code, so you  are not providing what OP asked for. 

Trusted Advisor
Posts: 1,222

Re: choosing entries

Hi Haikuo,

I think chronological filtering mechanism is required only for the final output after having maximum date with aggregated transaction amount >=100 for each client_id to generate the desired output.

Regards,

Respected Advisor
Posts: 3,147

Re: choosing entries

This maybe just due to different opinions or different takes on OP's question as :

"I'm launching a marketing campaign in which I grant every client that reached a transaction sum of 100$ on his credit card a free gift card,The problem is that I need to choose the first 100 clients that reached this threshold"

From what I understand, OP's request has 2 layers (OP please correct me if I am mistaken):

1) a client start aggregating transaction amounts until the date when $100 is reached, and this date is marked, because this would be the earliest date this particular client is eligible for the gift card award.

2).By the end of the promotion (or whatever cutoff date), the top 100 clients will be selected based on when they became eligible. "choose the first 100 clients that reached this threshold"

Let 's take a look at some beefed up mocked data:

data client;

      input ID @3 Date date7. @12 Amount;

datalines;

data client;

     input ID @3 Date date7. @12 Amount;

     datalines;

1 13may14 85

2 17oct14 120

3 03feb14 160

4 28feb14 80

5 10nov14 77

6 25apr14 1

7 03jun14 140

8 14jan14 900

9 24oct14 1245

10 27aug14 78

1 1may14 85

2 7oct14 120

3 13feb14 160

4 8feb14 5

5 1nov14 976

6 5apr14 88

7 23jun14 140

8 24jan14 900

9 14oct14 1245

10 2oct14 78

1 2may14 85

2 2oct14 120

3 23feb14 160

4 2feb14 5

5 2nov14 976

6 2apr14 88

7 3jun14 140

8 4jan14 900

9 24oct14 1245

10 22oct14 78

;

proc sql outobs=3;

create table want_stat as

select id, date format=date9. from client

group by id

having date=max(date) and sum(amount)>=100

order by date;

quit;

title "Stat's output";

Proc print;run;

proc sql outobs=3;

     create table want_haikuo as

           select id,

                case

                     when (select sum(amount) from client where id=a.id and date <= a.date)>=100 then date

                     else .

                end

           as _date format=date9.

                from client a

                     group by id

                           having _date=min(_date) and not missing (_date)

                                order by _date

     ;

quit;

title "Haikuo's output";

proc print;run;

Stat's output
ObsIDDate
1323-Feb-01
2924-Oct-01
3924-Oct-01
Haikuo's output
ObsID_date
1814-Jan-01
233-Feb-01
373-Jun-01

The dups in your output can be taken care of easily, but the date you are using concerns me. For instance ID=3, I used the date when ID=3 reaches $100, while you used the max date for all the dates available for ID=3. If you have worked in a Marketing department of a bank, you would have known this is unfair, doing so will put eligible customers way down the queue just because they have been recently active.

was using the same Date as I understand, however, he chooses winners by their positions (client ID in this case) in the data, which is also unfair IMHO.

Just my 2 cents,

Regards,

Haikuo

Trusted Advisor
Posts: 1,222

Re: choosing entries

Thanks Haikuo - Helpful to understand the concept.

Regards,

Respected Advisor
Posts: 3,147

Re: choosing entries

BTW, Congratulations for becoming a Master! Smiley Happy

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 15 replies
  • 563 views
  • 5 likes
  • 5 in conversation