Hey there,
i have a small problem that I cant fix.
I have a dataset with three variables: 1) customer 2) date 3) rating, which looks like this
Customer | Date | Rating |
1000 | 01.01.2016 | 5 |
1000 | 01.02.2016 | 5 |
1000 | 01.03.2016 | 5 |
1000 | 01.04.2016 | 6 |
1000 | 01.05.2016 | 7 |
1000 | 01.06.2016 | |
1000 | 01.07.2016 | |
1000 | 01.08.2016 | 9 |
1000 | 01.09.2016 | 9 |
1000 | 01.10.2016 | 10 |
1000 | 01.11.2016 | 10 |
1000 | 01.12.2016 | 10 |
2000 | 01.01.2016 | 4 |
2000 | 01.02.2016 | 4 |
2000 | 01.03.2016 | 4 |
2000 | 01.04.2016 | 4 |
2000 | 01.05.2016 | 5 |
2000 | 01.06.2016 | 5 |
2000 | 01.07.2016 | |
2000 | 01.08.2016 | |
2000 | 01.09.2016 | |
2000 | 01.10.2016 | 5 |
2000 | 01.11.2016 | 5 |
2000 | 01.12.2016 | |
3000 | 01.01.2016 | |
3000 | 01.02.2016 | |
3000 | 01.03.2016 | |
3000 | 01.04.2016 | 15 |
3000 | 01.05.2016 | 15 |
3000 | 01.06.2016 | 15 |
3000 | 01.07.2016 | 16 |
3000 | 01.08.2016 | 16 |
3000 | 01.09.2016 | 16 |
3000 | 01.10.2016 | |
3000 | 01.11.2016 | |
3000 | 01.12.2016 |
Now what I need is a way to only show me the lines that satisy the following condisitons:
1) show me the very first rating for a particular customer
2) also show me when there is a rating (for the same customer) for which the previous observation is missing
Does anyone ahve an idea how to solve this?
I would very much appreciate it 🙂
Thank you
Try this:
data want;
set have;
by customer date; /* by date to ensure that data is properly ordered */
retain flag savedate saverating;
oldrating = lag(rating);
olddate = lag(date);
if oldrating ne . and rating = . and not last.customer
then do;
date = olddate;
rating = oldrating;
output;
flag = 0;
end;
else if rating ne .
then do;
savedate = date;
saverating = rating;
flag = 1;
end;
if last.customer
then do;
if rating = .
then do;
if flag
then do;
date = savedate;
rating = saverating;
output;
end;
end;
else output;
end;
drop flag olddate oldrating savedate saverating;
run;
The programming is a little complicated as I wanted to avoid multiple outputs with the same data; one could make the code a little simpler and solve that by doing a proc sort noduprec in the end.
Still, it should now give you enough to chew on so you can grok the function of the first. and last. autoamtic variables.
1) Can be done like this
data have;
input Customer$ Date:ddmmyy10. Rating;
format Date ddmmyy10.;
datalines;
1000 01.01.2016 5
1000 01.02.2016 5
1000 01.03.2016 5
1000 01.04.2016 6
1000 01.05.2016 7
1000 01.06.2016 .
1000 01.07.2016 .
1000 01.08.2016 9
1000 01.09.2016 9
1000 01.10.2016 10
1000 01.11.2016 10
1000 01.12.2016 10
2000 01.01.2016 4
2000 01.02.2016 4
2000 01.03.2016 4
2000 01.04.2016 4
2000 01.05.2016 5
2000 01.06.2016 5
2000 01.07.2016 .
2000 01.08.2016 .
2000 01.09.2016 .
2000 01.10.2016 5
2000 01.11.2016 5
2000 01.12.2016 .
3000 01.01.2016 .
3000 01.02.2016 .
3000 01.03.2016 .
3000 01.04.2016 15
3000 01.05.2016 15
3000 01.06.2016 15
3000 01.07.2016 16
3000 01.08.2016 16
3000 01.09.2016 16
3000 01.10.2016 .
3000 01.11.2016 .
3000 01.12.2016 .
;
proc sort data = have;
by Customer;
run;
data want;
set have;
by Customer;
if first.Customer;
run;
2) If you only want the very first observation for each customer, how can you determine whether the previous observation for that customer is missing?
Use by processing and the lag() function:
data have;
infile cards dlm='09'x truncover;
input customer date :ddmmyy10. rating;
format date ddmmyyp10.;
cards;
1000 01.01.2016 5
1000 01.02.2016 5
1000 01.03.2016 5
1000 01.04.2016 6
1000 01.05.2016 7
1000 01.06.2016
1000 01.07.2016
1000 01.08.2016 9
1000 01.09.2016 9
1000 01.10.2016 10
1000 01.11.2016 10
1000 01.12.2016 10
2000 01.01.2016 4
2000 01.02.2016 4
2000 01.03.2016 4
2000 01.04.2016 4
2000 01.05.2016 5
2000 01.06.2016 5
2000 01.07.2016
2000 01.08.2016
2000 01.09.2016
2000 01.10.2016 5
2000 01.11.2016 5
2000 01.12.2016
3000 01.01.2016
3000 01.02.2016
3000 01.03.2016
3000 01.04.2016 15
3000 01.05.2016 15
3000 01.06.2016 15
3000 01.07.2016 16
3000 01.08.2016 16
3000 01.09.2016 16
3000 01.10.2016
3000 01.11.2016
3000 01.12.2016
;
run;
data want;
set have;
by customer date; /* by date to ensure that data is properly ordered */
retain flag;
oldrating = lag(rating);
if first.customer then flag = 1;
if flag and rating ne .
then do;
flag = 0;
output;
end;
else if not first.customer and rating ne . and oldrating = . then output;
drop flag oldrating;
run;
Dear KurtBremser,
it appears to be working. thank you very much.
I have a follow-up question if you dont mind.
I would like to do the same butz this I want to have the last observation of the variable "rating", i.e. the last observation for each customer or the last observation before a subsequent missing rating.
i tried to replace the "first" with "last. but that does not work.
It appears I do not quite understand the code.
if you have an answer to my follow up question I would rally appreciate for it.
Anyways thank you the working solution provided:
cheers
Joey
Try this:
data want;
set have;
by customer date; /* by date to ensure that data is properly ordered */
retain flag savedate saverating;
oldrating = lag(rating);
olddate = lag(date);
if oldrating ne . and rating = . and not last.customer
then do;
date = olddate;
rating = oldrating;
output;
flag = 0;
end;
else if rating ne .
then do;
savedate = date;
saverating = rating;
flag = 1;
end;
if last.customer
then do;
if rating = .
then do;
if flag
then do;
date = savedate;
rating = saverating;
output;
end;
end;
else output;
end;
drop flag olddate oldrating savedate saverating;
run;
The programming is a little complicated as I wanted to avoid multiple outputs with the same data; one could make the code a little simpler and solve that by doing a proc sort noduprec in the end.
Still, it should now give you enough to chew on so you can grok the function of the first. and last. autoamtic variables.
The following should work
data want;
set have;
by Customer date;
n=_n_;
if
( first.customer and not missing(rating) )
or
( not first.customer and missing(lag(rating)) and not missing(rating) )
;
run;
N.B: Only use the LAG() function in an IF condition if you fully understand how LAG() works. Else better use it unconditionally, assign the value to a variable and then use this variable in an IF condition.
Hi, think this works for both first and last values (easy tp modify if you want first and last in one data set, see after output) ...
data have;
set have;
new + missing(rating);
run;
data first_nonmiss last_nonmiss;
set have (where=(rating ne .));
by customer new;
if first.new then output first_nonmiss;
else
if last.new then output last_nonmiss;
drop new;
run;
DATA SET: first_nonmiss
customer date rating
1000 01.01.2016 5
1000 01.08.2016 9
2000 01.01.2016 4
2000 01.10.2016 5
3000 01.04.2016 15
DATA SET: last_nonmiss
customer date rating
1000 01.05.2016 7
1000 01.12.2016 10
2000 01.06.2016 5
2000 01.11.2016 5
3000 01.09.2016 16
Or ...
data first_last;
set have (where=(rating ne .));
by customer new;
if first.new or last.new;
* if you want first and last indicators;
first = first.new;
last = last.new;
drop new;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.