BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
JoeyE
Calcite | Level 5

Hey there,

i have a small problem that I cant fix.
I have a dataset with three variables: 1) customer 2) date 3) rating, which looks like this

CustomerDateRating
100001.01.20165
100001.02.20165
100001.03.20165
100001.04.20166
100001.05.20167
100001.06.2016 
100001.07.2016 
100001.08.20169
100001.09.20169
100001.10.201610
100001.11.201610
100001.12.201610
200001.01.20164
200001.02.20164
200001.03.20164
200001.04.20164
200001.05.20165
200001.06.20165
200001.07.2016 
200001.08.2016 
200001.09.2016 
200001.10.20165
200001.11.20165
200001.12.2016 
300001.01.2016 
300001.02.2016 
300001.03.2016 
300001.04.201615
300001.05.201615
300001.06.201615
300001.07.201616
300001.08.201616
300001.09.201616
300001.10.2016 
300001.11.2016 
300001.12.2016 

 

Now what I need is a way to only show me the lines that satisy the following condisitons:

1) show me the very first rating for a particular customer

2) also show me when there is a rating (for the same customer) for which the previous observation is missing

Does anyone ahve an idea how to solve this?
I would very much appreciate it 🙂
Thank you

1 ACCEPTED SOLUTION

Accepted Solutions
Kurt_Bremser
Super User

Try this:

data want;
set have;
by customer date; /* by date to ensure that data is properly ordered */
retain flag savedate saverating;
oldrating = lag(rating);
olddate = lag(date);
if oldrating ne . and rating = . and not last.customer
then do;
  date = olddate;
  rating = oldrating;
  output;
  flag = 0;
end;
else if rating ne .
then do;
  savedate = date;
  saverating = rating;
  flag = 1;
end;
if last.customer
then do;
  if rating = .
  then do;
    if flag
    then do;
      date = savedate;
      rating = saverating;
      output;
    end;
  end;
  else output;
end;
drop flag olddate oldrating savedate saverating;
run;

The programming is a little complicated as I wanted to avoid multiple outputs with the same data; one could make the code a little simpler and solve that by doing a proc sort noduprec in the end.

 

Still, it should now give you enough to chew on so you can grok the function of the first. and last. autoamtic variables.

View solution in original post

6 REPLIES 6
PeterClemmensen
Tourmaline | Level 20

1) Can be done like this

 

data have;
	input Customer$ Date:ddmmyy10. Rating;
	format Date ddmmyy10.;
	datalines; 
1000 01.01.2016 5 
1000 01.02.2016 5 
1000 01.03.2016 5 
1000 01.04.2016 6 
1000 01.05.2016 7 
1000 01.06.2016 .
1000 01.07.2016 .
1000 01.08.2016 9 
1000 01.09.2016 9 
1000 01.10.2016 10 
1000 01.11.2016 10 
1000 01.12.2016 10 
2000 01.01.2016 4 
2000 01.02.2016 4 
2000 01.03.2016 4 
2000 01.04.2016 4 
2000 01.05.2016 5 
2000 01.06.2016 5 
2000 01.07.2016 .
2000 01.08.2016 .
2000 01.09.2016 .
2000 01.10.2016 5 
2000 01.11.2016 5 
2000 01.12.2016 .
3000 01.01.2016 .
3000 01.02.2016 .
3000 01.03.2016 .
3000 01.04.2016 15 
3000 01.05.2016 15 
3000 01.06.2016 15 
3000 01.07.2016 16 
3000 01.08.2016 16 
3000 01.09.2016 16 
3000 01.10.2016 .
3000 01.11.2016 .
3000 01.12.2016 .
;

proc sort data = have;
	by Customer;
run;

data want;
	set have;
	by Customer;
	if first.Customer;
run;

 

2) If you only want the very first observation for each customer, how can you determine whether the previous observation for that customer is missing?

Kurt_Bremser
Super User

Use by processing and the lag() function:

data have;
infile cards dlm='09'x truncover;
input customer date :ddmmyy10. rating;
format date ddmmyyp10.;
cards;
1000	01.01.2016	5
1000	01.02.2016	5
1000	01.03.2016	5
1000	01.04.2016	6
1000	01.05.2016	7
1000	01.06.2016	 
1000	01.07.2016	 
1000	01.08.2016	9
1000	01.09.2016	9
1000	01.10.2016	10
1000	01.11.2016	10
1000	01.12.2016	10
2000	01.01.2016	4
2000	01.02.2016	4
2000	01.03.2016	4
2000	01.04.2016	4
2000	01.05.2016	5
2000	01.06.2016	5
2000	01.07.2016	 
2000	01.08.2016	 
2000	01.09.2016	 
2000	01.10.2016	5
2000	01.11.2016	5
2000	01.12.2016	 
3000	01.01.2016	 
3000	01.02.2016	 
3000	01.03.2016	 
3000	01.04.2016	15
3000	01.05.2016	15
3000	01.06.2016	15
3000	01.07.2016	16
3000	01.08.2016	16
3000	01.09.2016	16
3000	01.10.2016	 
3000	01.11.2016	 
3000	01.12.2016
;
run;

data want;
set have;
by customer date; /* by date to ensure that data is properly ordered */
retain flag;
oldrating = lag(rating);
if first.customer then flag = 1;
if flag and rating ne .
then do;
  flag = 0;
  output;
end;
else if not first.customer and rating ne . and oldrating = . then output;
drop flag oldrating;
run;
JoeyE
Calcite | Level 5

Dear KurtBremser,

it appears to be working. thank you very much.

I have a follow-up question if you dont mind.
I would like to do the same butz this I want to have the last observation of the variable "rating", i.e. the last observation for each customer or the last observation before a subsequent missing rating.
i tried to replace the "first" with "last. but that does not work.
It appears I do not quite understand the code.

if you have an answer to my follow up question I would rally appreciate for it.
Anyways thank you the working solution provided:
cheers

Joey

Kurt_Bremser
Super User

Try this:

data want;
set have;
by customer date; /* by date to ensure that data is properly ordered */
retain flag savedate saverating;
oldrating = lag(rating);
olddate = lag(date);
if oldrating ne . and rating = . and not last.customer
then do;
  date = olddate;
  rating = oldrating;
  output;
  flag = 0;
end;
else if rating ne .
then do;
  savedate = date;
  saverating = rating;
  flag = 1;
end;
if last.customer
then do;
  if rating = .
  then do;
    if flag
    then do;
      date = savedate;
      rating = saverating;
      output;
    end;
  end;
  else output;
end;
drop flag olddate oldrating savedate saverating;
run;

The programming is a little complicated as I wanted to avoid multiple outputs with the same data; one could make the code a little simpler and solve that by doing a proc sort noduprec in the end.

 

Still, it should now give you enough to chew on so you can grok the function of the first. and last. autoamtic variables.

Patrick
Opal | Level 21

@JoeyE

The following should work

data want;
  set have;
  by Customer date;
  n=_n_;
  if 
    ( first.customer and not missing(rating) ) 
      or 
    ( not first.customer and missing(lag(rating)) and not missing(rating) )
    ;
run;

N.B: Only use the LAG() function in an IF condition if you fully understand how LAG() works. Else better use it unconditionally, assign the value to a variable and then use this variable in an IF condition.

 

MikeZdeb
Rhodochrosite | Level 12

Hi, think this works for both first and last values (easy tp modify if you want first and last in one data set, see after output) ...


data have;
set have;
new + missing(rating);
run;

 

data first_nonmiss last_nonmiss;
set have (where=(rating ne .));
by customer new;
if first.new then output first_nonmiss;

else

if last.new then output last_nonmiss;

drop new;

run;

DATA SET: first_nonmiss

customer date       rating

1000     01.01.2016   5
1000     01.08.2016   9
2000     01.01.2016   4
2000     01.10.2016   5
3000     01.04.2016  15

 

DATA SET: last_nonmiss

customer date        rating

1000     01.05.2016    7
1000     01.12.2016   10
2000     01.06.2016    5
2000     01.11.2016    5
3000     01.09.2016   16

 

Or ...

 

data first_last;
set have (where=(rating ne .));
by customer new;
if first.new or last.new;

* if you want first and last indicators;
first = first.new;
last = last.new;
drop new;
run;

 

 

 

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 1855 views
  • 0 likes
  • 5 in conversation