DATA Step, Macro, Functions and more

Identify variable with same value in consecutive months

Accepted Solution Solved
Reply
Contributor
Posts: 39
Accepted Solution

Identify variable with same value in consecutive months

This is a sample of my dataset.  I want to identify any ID where they purchased the same fruit in consecutive months at any time during a given date range.  We can use 01/01/2017 - 12/31/2017 as an example (this date range will be expanded at a later time).

 

In this example, ID 111 and 222 should be returned only or flagged in some way that I can identify as such.  Looking to code this in PROC SQL, however, it its easier to use a data step instead that will work too.

 

Thanks in advance.

 

ID PURCHASE_DATE FRUIT
111 01/20/2017 Oranges
111 02/01/2017 Oranges
222 04/24/2017 Apples
222 05/12/2017 Apples
222 08/28/2017 Apples
222 10/10/2017 Apples
333 03/14/2016 Pears
333 09/23/2016 Pears

Accepted Solutions
Solution
‎03-02-2018 09:29 AM
Esteemed Advisor
Posts: 5,625

Re: Identify variable with same value in consecutive months

Posted in reply to PhatRam33

Use

 

duration = intck("month", lag(purchase_date), purchase_date);
if not first.fruit and duration <= 1 then flag = 1;

instead.

 

PG

View solution in original post


All Replies
Super User
Posts: 24,010

Re: Identify variable with same value in consecutive months

Posted in reply to PhatRam33

Whats your expected output from the input data set?

Super User
Posts: 6,934

Re: Identify variable with same value in consecutive months

Posted in reply to PhatRam33

It would be easier if you would reconsider the definition of "consecutive".  How about within 30 days?  After all, consecutive months would treat these differently:

 

03/01/2016 to 04/30/2016 = consecutive

 

01/31/2016 to 03/01/2016 = not consecutive, even though the duration between purchases is smaller

Contributor
Posts: 39

Re: Identify variable with same value in consecutive months

Posted in reply to Astounding

Yes, I would define as consecutive months would be within 30 days.

Super User
Posts: 6,934

Re: Identify variable with same value in consecutive months

[ Edited ]
Posted in reply to PhatRam33

OK, here's a way to set the flags:

 

proc sort data=have;

by id fruit purchase_date;

run;

data want;

set have;

by id fruit;

duration = dif(purchase_date);

if first.fruit = 0 and duration <= 30 then flag=1;

drop duration;

run;

 

Once the observations have been flagged, you still have to decide what should happen next.  And your dates must be valid SAS dates, not character strings.

Contributor
Posts: 39

Re: Identify variable with same value in consecutive months

[ Edited ]
Posted in reply to Astounding

Thank you very much @Astounding

 

I was able to get this to work and formatted my date field as such with a separate data step:

purchase_date=datepart(purchase_date);

format purchase_date date9.;

 

111 returned with value of 1 for the second row and 222 with a value of 1 for the 4th row.

 

For now I was just planning on creating a proc sql  and query on any Ids where they were flagged with a value of 1 and pull in any additional detail from another data set.  This is a work in progress, however wanted to get this first foundational step complete.

 

I did have a question though, if say the date fell on 01/01/2017 and the next date occurred on 02/25/2017 would fall greater than 30 days, would I need to expand my criteria greater than 30 or is there a way that sas can format the dates as months and then look for consecutive in that manner.  Sorry for the loaded question.

PROC Star
Posts: 631

Re: Identify variable with same value in consecutive months

Posted in reply to PhatRam33

Hi,

 

try this,

 

DATA HAVE ;
format PURCHASE_DATE mmddyy10.;
INFILE DATALINES DLM=",";
INPUT ID :3. PURCHASE_DATE : mmddyy10. FRUIT $;
DATALINES;
111,01/20/2017,Oranges
111,02/01/2017,Oranges
222,04/24/2017,Apples
222,05/12/2017,Apples
222,08/28/2017,Apples
222,10/10/2017,Apples
333,03/14/2016,Pears
333,09/23/2016,Pears
;
RUN;

PROC SORT DATA=HAVE;
BY ID FRUIT PURCHASE_DATE ;
RUN;
%let Start_Date=01JAN2017;
%let End_Date=31DEC2017;

DATA WANT;
FORMAT Lag_Date mmddyy10.;
SET HAVE;
BY ID FRUIT PURCHASE_DATE ;
Lag_Date=LAG(PURCHASE_DATE);
VAL=(MONTH(PURCHASE_DATE)+YEAR(PURCHASE_DATE))-(MONTH(LAG(PURCHASE_DATE))+YEAR(LAG(PURCHASE_DATE)));
IF FIRST.FRUIT=0 and "&Start_Date"D<=LAG(PURCHASE_DATE)<="&End_Date"D and val=1;
run;
Thanks,
Suryakiran
Contributor
Posts: 39

Re: Identify variable with same value in consecutive months

Posted in reply to SuryaKiran

Thank you @SuryaKiran.

 

Would your code still work if say the end_date occurred in a different year than the start_date as say 01/31/2018?

 

 

Solution
‎03-02-2018 09:29 AM
Esteemed Advisor
Posts: 5,625

Re: Identify variable with same value in consecutive months

Posted in reply to PhatRam33

Use

 

duration = intck("month", lag(purchase_date), purchase_date);
if not first.fruit and duration <= 1 then flag = 1;

instead.

 

PG
Contributor
Posts: 39

Re: Identify variable with same value in consecutive months

[ Edited ]

Thank you @PGStats.  Similar question to what I asked @SuryaKiran, would this work if my end date fell under a different year such as one in 2018?

 

I appreciate all the help everyone! I am truly grateful!

 

Esteemed Advisor
Posts: 5,625

Re: Identify variable with same value in consecutive months

Posted in reply to PhatRam33

Don't be afraid to do your own testing, it is an excellent learning opportunity. Yes, SAS date functions work across years.

PG
Contributor
Posts: 39

Re: Identify variable with same value in consecutive months

Many thanks will do just that and mark solution later today.

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 11 replies
  • 204 views
  • 1 like
  • 5 in conversation