Creating variables that derive their values on a given point on a long...

EC27556 · Posted 11-02-2020 06:13 AM

I couldn't really think of an appropriate title which probably doesn't make too much sense but hopefully the following will:

I have an array of variables A1 to A10. Each variable's value is a specific date which relates to a period in time when the customers details were updated on the database.

i.e. A1=21st Dec 2018, A2=30th Jan 2019, A3=26th Feb 2019...... etc for customer 1.

Each of the variables above have similar sequentially labled variables related to the number of customer clicks on a website on the day of each of the A1-A10 dates:

i.e. Click1=50, Click2=40 etc.

I also have a unique review date variable for each customer. These are random dates that fall on any date in the period between A1-A10 dates. I.e. The dates could be values such as 29th Jan 2019, 2nd Feb 2019 etc.

What I am looking to do is create a variable that returns the value of clicks on the closest date before the review date. I.e. If review date is 25th December 2018 for customer 1, I would like to be able to look back to the closest date before it, which would be A1 (21st December 2018) and then return the value of Click 1.

Does anyone know any code which would enable me to do this?

andreas_lds · Posted 11-02-2020 06:31 AM

Please post data in usable form, this will help us to better understand your problem and we have something to work with.

PeterClemmensen · Posted 11-02-2020 06:33 AM

Perhaps something like this?

I just created some simple example data for demonstration. The code is easily extendable to more variables.

data have;
input id (a1-a3)(:date9.) click1-click3 review :date9.;
format a: review date9.;
datalines;
1 01Jan2020 05Jan2020 10Jan2020 10 20 30 03Jan2020
2 05Jan2020 10Jan2020 15Jan2020 40 50 60 12Jan2020
3 10Jan2020 15Jan2020 20Jan2020 70 80 90 15Jan2020
;

data want(drop = d idx);
   set have;
   array a {*} a:;
   array click {*} click:;
   d = review;
   do while (1);
      d +-1;
      idx = whichn(d, of a[*]);
      if idx | d = min(of a[*]) then leave;
   end;
   v = click[idx];
run;

Result

id a1        a2        a3        click1 click2 click3 review    v 
1  01JAN2020 05JAN2020 10JAN2020 10     20     30     03JAN2020 10 
2  05JAN2020 10JAN2020 15JAN2020 40     50     60     12JAN2020 50 
3  10JAN2020 15JAN2020 20JAN2020 70     80     90     15JAN2020 70

The DATA to DATA Step Macro
Blog: SASnrd

EC27556 · Posted 11-02-2020 09:06 AM

Hi,

Thanks! This seems to work when I use a small subset of my total dataset (e.g. 150 customers) however when I try to run the code for my entire dataset (roughly 5k customers) the code still hasn't run even though it has been 2 hours!

Have you got any idea why the code would run instantly for 150 customers but seemingly perpetually for 5k customers?

Thanks

PeterClemmensen · Posted 11-02-2020 09:17 AM

Are the date variables sorted? LIke in my example data?

And if not, am I allowed to sort them?

The DATA to DATA Step Macro
Blog: SASnrd

PeterClemmensen · Posted 11-03-2020 02:46 AM

Try this. At a little extra memory cost, but should speed things up.

10958 and 25932 are the numerical representations of the dates 01Jan1990 and 31Dec2030. You can expand as needed.

data have;
input id (a1-a3)(:date9.) click1-click3 review :date9.;
format a: review date9.;
datalines;
1 01Jan2020 05Jan2020 10Jan2020 10 20 30 03Jan2020
2 05Jan2020 10Jan2020 15Jan2020 40 50 60 12Jan2020
3 10Jan2020 15Jan2020 20Jan2020 70 80 90 15Jan2020
;

data want (drop=j);
   set have;
   array d {10958 : 25932} _temporary_;
   array a a:;
   array click click:;
   call missing (of d[*]);

   do over a; d[a] = 1; end;
   j = review;

   do while (1);
      j +- 1;
      if d[j] then leave;
   end;

   _I_ = whichn(j, of a[*]);
   v = click;
   
run;

The DATA to DATA Step Macro
Blog: SASnrd

Kurt_Bremser · Posted 11-03-2020 03:33 AM

To show you how it works with a proper longitudinal dataset, using @PeterClemmensen 's example data:

data have;
input id (a1-a3)(:date9.) click1-click3 review :date9.;
format a: review yymmdd10.;
datalines;
1 01Jan2020 05Jan2020 10Jan2020 10 20 30 03Jan2020
2 05Jan2020 10Jan2020 15Jan2020 40 50 60 12Jan2020
3 10Jan2020 15Jan2020 20Jan2020 70 80 90 15Jan2020
;

data id;
set have;
keep id review;
run;

proc transpose
  data=have (keep=id a:)
  out=l1 (rename=(col1=a_date))
;
by id;
var a:;
run;

proc transpose
  data=have (keep=id cl:)
  out=l2 (rename=(col1=click))
;
by id;
var cl:;
run;

data l1_a;
set l1;
seq = input(substr(_name_,2),best.);
drop _name_;
run;

data l2_a;
set l2;
seq = input(substr(_name_,6),best.);
drop _name_;
run;

data long;
merge
  l1_a
  l2_a
;
by id seq;
drop seq; /* no longer needed */
run;

You now have two datasets, on containing the "dimension" data of your ID's, the other the "series" data.

This is the structure your data should have right from the start; if you get wide data from any source, always transpose to long as a first step.

A quicker transpose can be done with this:

data long;
set have;
array _date {*} a1-a3;
array _click {*} click1-click3;
do i = 1 to dim(_date);
  a_date = _date{i};
  click = _click{i};
  output;
end;
format a_date yymmdd10.;
keep id a_date click;
run;

With these datasets, you do it in a data step like this:

data want;
retain _date _click _flag;
merge
  long
  id
;
by id;
if first.id then _flag = 1;
if a_date ge review and _flag
then do;
  a_date = _date;
  click = _click;
  output;
  _flag = 0;
end;
_date = a_date;
_click = click;
drop _:;
run;

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Kurt_Bremser · Posted 11-02-2020 06:42 AM

Your dataset is in fact the exact opposite of a longitudinal dataset, and a very poor choice for dataset structure.

what if you have more than 10 updates?
if you have less than 10 updates, you waste storage space
to deal with a specific date, you need to loop over arrays; SQL does not have such a concept, so you lose SQL as a tool

So I recommend that you first transpose to a structure like this

customer date click

and then use something like this:

data want;
set have;
where date le &revdate.;
by customer;
if last.customer;
run;

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Creating variables that derive their values on a given point on a longitudinal Dataset

Re: Creating variables that derive their values on a given point on a longitudinal Dataset

Re: Creating variables that derive their values on a given point on a longitudinal Dataset

Re: Creating variables that derive their values on a given point on a longitudinal Dataset

Re: Creating variables that derive their values on a given point on a longitudinal Dataset

Re: Creating variables that derive their values on a given point on a longitudinal Dataset

Re: Creating variables that derive their values on a given point on a longitudinal Dataset

Re: Creating variables that derive their values on a given point on a longitudinal Dataset

SAS Innovate 2025: Call for Content

Classroom Training Available!