BookmarkSubscribeRSS Feed
EC27556
Quartz | Level 8

I couldn't really think of an appropriate title which probably doesn't make too much sense but hopefully the following will:

 

I have an array of variables A1 to A10. Each variable's value is a specific date which relates to a period in time when the customers details were updated on the database.

 

i.e. A1=21st Dec 2018, A2=30th Jan 2019, A3=26th Feb 2019...... etc for customer 1.

 

Each of the variables above have similar sequentially labled variables related to the number of customer clicks on a website on the day of each of the A1-A10 dates:

 

i.e. Click1=50, Click2=40 etc.

 

I also have a unique review date variable for each customer. These are random dates that fall on any date in the period between A1-A10 dates. I.e. The dates could be values such as 29th Jan 2019, 2nd Feb 2019 etc.

 

 

What I am looking to do is create a variable that returns the value of clicks on the closest date before the review date. I.e. If review date is 25th December 2018 for customer 1, I would like to be able to look back to the closest date before it, which would be A1 (21st December 2018) and then return the value of Click 1.

 

Does anyone know any code which would enable me to do this?

7 REPLIES 7
andreas_lds
Jade | Level 19

Please post data in usable form, this will help us to better understand your problem and we have something to work with.

PeterClemmensen
Tourmaline | Level 20

Perhaps something like this?

 

I just created some simple example data for demonstration. The code is easily extendable to more variables.

 

data have;
input id (a1-a3)(:date9.) click1-click3 review :date9.;
format a: review date9.;
datalines;
1 01Jan2020 05Jan2020 10Jan2020 10 20 30 03Jan2020
2 05Jan2020 10Jan2020 15Jan2020 40 50 60 12Jan2020
3 10Jan2020 15Jan2020 20Jan2020 70 80 90 15Jan2020
;

data want(drop = d idx);
   set have;
   array a {*} a:;
   array click {*} click:;
   d = review;
   do while (1);
      d +-1;
      idx = whichn(d, of a[*]);
      if idx | d = min(of a[*]) then leave;
   end;
   v = click[idx];
run;

Result

id a1        a2        a3        click1 click2 click3 review    v 
1  01JAN2020 05JAN2020 10JAN2020 10     20     30     03JAN2020 10 
2  05JAN2020 10JAN2020 15JAN2020 40     50     60     12JAN2020 50 
3  10JAN2020 15JAN2020 20JAN2020 70     80     90     15JAN2020 70 
EC27556
Quartz | Level 8
Hi,

Thanks! This seems to work when I use a small subset of my total dataset (e.g. 150 customers) however when I try to run the code for my entire dataset (roughly 5k customers) the code still hasn't run even though it has been 2 hours!

Have you got any idea why the code would run instantly for 150 customers but seemingly perpetually for 5k customers?

Thanks
PeterClemmensen
Tourmaline | Level 20

Are the date variables sorted? LIke in my example data?

 

And if not, am I allowed to sort them?

PeterClemmensen
Tourmaline | Level 20

Try this. At a little extra memory cost, but should speed things up. 

 

10958 and 25932 are the numerical representations of the dates 01Jan1990 and 31Dec2030. You can expand as needed.

 

data have;
input id (a1-a3)(:date9.) click1-click3 review :date9.;
format a: review date9.;
datalines;
1 01Jan2020 05Jan2020 10Jan2020 10 20 30 03Jan2020
2 05Jan2020 10Jan2020 15Jan2020 40 50 60 12Jan2020
3 10Jan2020 15Jan2020 20Jan2020 70 80 90 15Jan2020
;

data want (drop=j);
   set have;
   array d {10958 : 25932} _temporary_;
   array a a:;
   array click click:;
   call missing (of d[*]);

   do over a; d[a] = 1; end;
   j = review;

   do while (1);
      j +- 1;
      if d[j] then leave;
   end;

   _I_ = whichn(j, of a[*]);
   v = click;
   
run;
Kurt_Bremser
Super User

To show you how it works with a proper longitudinal dataset, using @PeterClemmensen 's example data:

 

data have;
input id (a1-a3)(:date9.) click1-click3 review :date9.;
format a: review yymmdd10.;
datalines;
1 01Jan2020 05Jan2020 10Jan2020 10 20 30 03Jan2020
2 05Jan2020 10Jan2020 15Jan2020 40 50 60 12Jan2020
3 10Jan2020 15Jan2020 20Jan2020 70 80 90 15Jan2020
;

data id;
set have;
keep id review;
run;

proc transpose
  data=have (keep=id a:)
  out=l1 (rename=(col1=a_date))
;
by id;
var a:;
run;

proc transpose
  data=have (keep=id cl:)
  out=l2 (rename=(col1=click))
;
by id;
var cl:;
run;

data l1_a;
set l1;
seq = input(substr(_name_,2),best.);
drop _name_;
run;

data l2_a;
set l2;
seq = input(substr(_name_,6),best.);
drop _name_;
run;

data long;
merge
  l1_a
  l2_a
;
by id seq;
drop seq; /* no longer needed */
run;

You now have two datasets, on containing the "dimension" data of your ID's, the other the "series" data.

 

This is the structure your data should have right from the start; if you get wide data from any source, always transpose to long as a first step.

A quicker transpose can be done with this:

 

data long;
set have;
array _date {*} a1-a3;
array _click {*} click1-click3;
do i = 1 to dim(_date);
  a_date = _date{i};
  click = _click{i};
  output;
end;
format a_date yymmdd10.;
keep id a_date click;
run;

With these datasets, you do it in a data step like this:

 

data want;
retain _date _click _flag;
merge
  long
  id
;
by id;
if first.id then _flag = 1;
if a_date ge review and _flag
then do;
  a_date = _date;
  click = _click;
  output;
  _flag = 0;
end;
_date = a_date;
_click = click;
drop _:;
run;
Kurt_Bremser
Super User

Your dataset is in fact the exact opposite of a longitudinal dataset, and a very poor choice for dataset structure.

  • what if you have more than 10 updates?
  • if you have less than 10 updates, you waste storage space
  • to deal with a specific date, you need to loop over arrays; SQL does not have such a concept, so you lose SQL as a tool

So I recommend that you first transpose to a structure like this

customer date click

and then use something like this:

data want;
set have;
where date le &revdate.;
by customer;
if last.customer;
run;

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 927 views
  • 4 likes
  • 4 in conversation