BookmarkSubscribeRSS Feed
ercksh8
Fluorite | Level 6

Hello all.

DATA item1;
   INPUT ID $ Transaction_number $ Date1;
CARDS;
  1 1000 20190112
  1 1001 20190121
  1 1002 20200111
  2 1003 20190102
  2 1004 20200110
3 1005 20210123
5 1006 20210102
6 1007 20200101 ; RUN; DATA item2; INPUT ID $ Transaction_number $ Date2; CARDS; 1 2000 20190211 1 2001 20210102 2 2002 20200521
3 2003 20210101
4 2004 20200101
5 1006 20210102
5 2005 20210202
; RUN;DATA item3; INPUT ID $ Transaction_number $ Date3; CARDS; 1 3000 20210411 1 3001 20200102 2 3002 20200521
3 3003 20200101
4 3004 20190101
5 3006 20200102
5 3005 20210202
; RUN;

I am working with a data that looks something like this.

The dates in the above data in my dataset were characters so I converted them into numerical values by doing this.

 

DATA item1; set item1;
   date_1 = INPUT(Date1, YYMMDD8.);
   FORMAT date_1 YYMMDDN8.;
   PUT date_1=;
RUN;
DATA item2; set item2;
   date_2 = INPUT(Date2, YYMMDD8.);
   FORMAT date_2 YYMMDD8.;
   PUT date_2=;
RUN;
DATA item3; set item3;
   date_3 = INPUT(Date3, YYMMDD8.);
   FORMAT date_3 YYMMDD8.;
   PUT date_3=;
RUN;

I am trying to create a dataset where I only include those who purchased item 1 and create variables YN_2 YN_3 to see if the individuals that purchased item 1 purchased item 2, 3 within a year since purchasing item 1.

 

 

 

PROC SQL;
   CREATE TABLE YN_2 as
   SELECT SA.ID, SA.Transaction_number, CASE WHEN SA.Date1 <= SB.Date2 <= (SA.Date1 + 365) THEN 1
ELSE 0 END AS YN_2
FROM item1 AS SA
LEFT JOIN item2 AS SB
ON SA.ID = SB.ID
ORDER BY ID
; QUIT;

I have been able to find YN_2 and YN_3 separately using the code above, but am struggling to put them all together. I would only like to keep the first dates of purchase in item1 but don't want to do the same for items 2 and 3 as the initial date might not fall within the 1 year period but have a future purchase that falls within the 1 year. The end product I would like is a data where all IDs observed are included and have YN_2, YN_3...YN_n(n being an arbitrary number) of all items that I have (more than the three I included above) among those who have purchased item 1.

 

The data above is only a sample of the data and I'm sorry in advance if it feels too incomplete. 

 

Thank you for the help.

 

 

 

3 REPLIES 3
ercksh8
Fluorite | Level 6

I would like to use the earliest date as the index date.

Kurt_Bremser
Super User

The DATA stepis very good at working sequences, so use that.

First, let us read the data so that we have usable SAS dates:

data item1;
input ID $ Transaction_number $ Date1 :yymmdd8.;
format date1 yymmdd10.;
datalines;
1 1000 20190112
1 1001 20190121
1 1002 20200111
2 1003 20190102
2 1004 20200110
3 1005 20210123
5 1006 20210102 
6 1007 20200101
;

data item2;
input ID $ Transaction_number $ Date2 :yymmdd8.;
format date2 yymmdd10.;
datalines;
1 2000 20190211
1 2001 20210102
2 2002 20200521
3 2003 20210101
4 2004 20200101
5 1006 20210102
5 2005 20210202
;

data item3;
input ID $ Transaction_number $ Date3 :yymmdd8.;
format date3 yymmdd10.;
cards;
1 3000 20210411
1 3001 20200102
2 3002 20200521
3 3003 20200101
4 3004 20190101
5 3006 20200102
5 3005 20210202
;

Next, combine all datasets into one and sort:

data all;
length dname $41;
set
  item1 (rename=(date1=date))
  item2 (rename=(date2=date))
  item3 (rename=(date3=date))
  indsname=dname
;
length item $32;
item = scan(dname,2,".");
run;

proc sort data=all;
by id date;
run;

If the individual dataset are already sorted by id and date, you can use a BY in the DATA step to "interleave" the observations, so you do not need the extra sort.

Now, a data step will find your hits:

data want;
set all;
by id;
retain yn_2 yn_3 date_1;
if first.id
then do;
  yn_2 = 0;
  yn_3 = 0;
  date_1 = .;
end;
if date_1 = .
then do;
  if item = "ITEM1" then date_1 = date;
end;
else do;
  diff = intck('year',date_1,date,"c");
  if diff le 1
  then do;
    if item = "ITEM2" then yn_2 = 1;
    if item = "ITEM3" then yn_3 = 1;
  end;
end;
if last.id;
keep id yn_2 yn_3;
run;

To see the inner workings of this step, comment the last two statements, so all observations and variables can be seen.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 478 views
  • 0 likes
  • 2 in conversation