I couldn't really think of an appropriate title which probably doesn't make too much sense but hopefully the following will:
I have an array of variables A1 to A10. Each variable's value is a specific date which relates to a period in time when the customers details were updated on the database.
i.e. A1=21st Dec 2018, A2=30th Jan 2019, A3=26th Feb 2019...... etc for customer 1.
Each of the variables above have similar sequentially labled variables related to the number of customer clicks on a website on the day of each of the A1-A10 dates:
i.e. Click1=50, Click2=40 etc.
I also have a unique review date variable for each customer. These are random dates that fall on any date in the period between A1-A10 dates. I.e. The dates could be values such as 29th Jan 2019, 2nd Feb 2019 etc.
What I am looking to do is create a variable that returns the value of clicks on the closest date before the review date. I.e. If review date is 25th December 2018 for customer 1, I would like to be able to look back to the closest date before it, which would be A1 (21st December 2018) and then return the value of Click 1.
Does anyone know any code which would enable me to do this?
Please post data in usable form, this will help us to better understand your problem and we have something to work with.
Perhaps something like this?
I just created some simple example data for demonstration. The code is easily extendable to more variables.
data have;
input id (a1-a3)(:date9.) click1-click3 review :date9.;
format a: review date9.;
datalines;
1 01Jan2020 05Jan2020 10Jan2020 10 20 30 03Jan2020
2 05Jan2020 10Jan2020 15Jan2020 40 50 60 12Jan2020
3 10Jan2020 15Jan2020 20Jan2020 70 80 90 15Jan2020
;
data want(drop = d idx);
set have;
array a {*} a:;
array click {*} click:;
d = review;
do while (1);
d +-1;
idx = whichn(d, of a[*]);
if idx | d = min(of a[*]) then leave;
end;
v = click[idx];
run;
Result
id a1 a2 a3 click1 click2 click3 review v 1 01JAN2020 05JAN2020 10JAN2020 10 20 30 03JAN2020 10 2 05JAN2020 10JAN2020 15JAN2020 40 50 60 12JAN2020 50 3 10JAN2020 15JAN2020 20JAN2020 70 80 90 15JAN2020 70
Are the date variables sorted? LIke in my example data?
And if not, am I allowed to sort them?
Try this. At a little extra memory cost, but should speed things up.
10958 and 25932 are the numerical representations of the dates 01Jan1990 and 31Dec2030. You can expand as needed.
data have;
input id (a1-a3)(:date9.) click1-click3 review :date9.;
format a: review date9.;
datalines;
1 01Jan2020 05Jan2020 10Jan2020 10 20 30 03Jan2020
2 05Jan2020 10Jan2020 15Jan2020 40 50 60 12Jan2020
3 10Jan2020 15Jan2020 20Jan2020 70 80 90 15Jan2020
;
data want (drop=j);
set have;
array d {10958 : 25932} _temporary_;
array a a:;
array click click:;
call missing (of d[*]);
do over a; d[a] = 1; end;
j = review;
do while (1);
j +- 1;
if d[j] then leave;
end;
_I_ = whichn(j, of a[*]);
v = click;
run;
To show you how it works with a proper longitudinal dataset, using @PeterClemmensen 's example data:
data have;
input id (a1-a3)(:date9.) click1-click3 review :date9.;
format a: review yymmdd10.;
datalines;
1 01Jan2020 05Jan2020 10Jan2020 10 20 30 03Jan2020
2 05Jan2020 10Jan2020 15Jan2020 40 50 60 12Jan2020
3 10Jan2020 15Jan2020 20Jan2020 70 80 90 15Jan2020
;
data id;
set have;
keep id review;
run;
proc transpose
data=have (keep=id a:)
out=l1 (rename=(col1=a_date))
;
by id;
var a:;
run;
proc transpose
data=have (keep=id cl:)
out=l2 (rename=(col1=click))
;
by id;
var cl:;
run;
data l1_a;
set l1;
seq = input(substr(_name_,2),best.);
drop _name_;
run;
data l2_a;
set l2;
seq = input(substr(_name_,6),best.);
drop _name_;
run;
data long;
merge
l1_a
l2_a
;
by id seq;
drop seq; /* no longer needed */
run;
You now have two datasets, on containing the "dimension" data of your ID's, the other the "series" data.
This is the structure your data should have right from the start; if you get wide data from any source, always transpose to long as a first step.
A quicker transpose can be done with this:
data long;
set have;
array _date {*} a1-a3;
array _click {*} click1-click3;
do i = 1 to dim(_date);
a_date = _date{i};
click = _click{i};
output;
end;
format a_date yymmdd10.;
keep id a_date click;
run;
With these datasets, you do it in a data step like this:
data want;
retain _date _click _flag;
merge
long
id
;
by id;
if first.id then _flag = 1;
if a_date ge review and _flag
then do;
a_date = _date;
click = _click;
output;
_flag = 0;
end;
_date = a_date;
_click = click;
drop _:;
run;
Your dataset is in fact the exact opposite of a longitudinal dataset, and a very poor choice for dataset structure.
So I recommend that you first transpose to a structure like this
customer date click
and then use something like this:
data want;
set have;
where date le &revdate.;
by customer;
if last.customer;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.