DATA Step, Macro, Functions and more

showing first values for either a new variable or previous missing values

Accepted Solution Solved
Reply
New Contributor
Posts: 4
Accepted Solution

showing first values for either a new variable or previous missing values

Hey there,

i have a small problem that I cant fix.
I have a dataset with three variables: 1) customer 2) date 3) rating, which looks like this

CustomerDateRating
100001.01.20165
100001.02.20165
100001.03.20165
100001.04.20166
100001.05.20167
100001.06.2016 
100001.07.2016 
100001.08.20169
100001.09.20169
100001.10.201610
100001.11.201610
100001.12.201610
200001.01.20164
200001.02.20164
200001.03.20164
200001.04.20164
200001.05.20165
200001.06.20165
200001.07.2016 
200001.08.2016 
200001.09.2016 
200001.10.20165
200001.11.20165
200001.12.2016 
300001.01.2016 
300001.02.2016 
300001.03.2016 
300001.04.201615
300001.05.201615
300001.06.201615
300001.07.201616
300001.08.201616
300001.09.201616
300001.10.2016 
300001.11.2016 
300001.12.2016 

 

Now what I need is a way to only show me the lines that satisy the following condisitons:

1) show me the very first rating for a particular customer

2) also show me when there is a rating (for the same customer) for which the previous observation is missing

Does anyone ahve an idea how to solve this?
I would very much appreciate it Smiley Happy
Thank you


Accepted Solutions
Solution
‎06-27-2017 12:23 PM
Super User
Posts: 6,939

Re: showing first values for either a new variable or previous missing values

Try this:

data want;
set have;
by customer date; /* by date to ensure that data is properly ordered */
retain flag savedate saverating;
oldrating = lag(rating);
olddate = lag(date);
if oldrating ne . and rating = . and not last.customer
then do;
  date = olddate;
  rating = oldrating;
  output;
  flag = 0;
end;
else if rating ne .
then do;
  savedate = date;
  saverating = rating;
  flag = 1;
end;
if last.customer
then do;
  if rating = .
  then do;
    if flag
    then do;
      date = savedate;
      rating = saverating;
      output;
    end;
  end;
  else output;
end;
drop flag olddate oldrating savedate saverating;
run;

The programming is a little complicated as I wanted to avoid multiple outputs with the same data; one could make the code a little simpler and solve that by doing a proc sort noduprec in the end.

 

Still, it should now give you enough to chew on so you can grok the function of the first. and last. autoamtic variables.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers

View solution in original post


All Replies
PROC Star
Posts: 551

Re: showing first values for either a new variable or previous missing values

1) Can be done like this

 

data have;
	input Customer$ Date:ddmmyy10. Rating;
	format Date ddmmyy10.;
	datalines; 
1000 01.01.2016 5 
1000 01.02.2016 5 
1000 01.03.2016 5 
1000 01.04.2016 6 
1000 01.05.2016 7 
1000 01.06.2016 .
1000 01.07.2016 .
1000 01.08.2016 9 
1000 01.09.2016 9 
1000 01.10.2016 10 
1000 01.11.2016 10 
1000 01.12.2016 10 
2000 01.01.2016 4 
2000 01.02.2016 4 
2000 01.03.2016 4 
2000 01.04.2016 4 
2000 01.05.2016 5 
2000 01.06.2016 5 
2000 01.07.2016 .
2000 01.08.2016 .
2000 01.09.2016 .
2000 01.10.2016 5 
2000 01.11.2016 5 
2000 01.12.2016 .
3000 01.01.2016 .
3000 01.02.2016 .
3000 01.03.2016 .
3000 01.04.2016 15 
3000 01.05.2016 15 
3000 01.06.2016 15 
3000 01.07.2016 16 
3000 01.08.2016 16 
3000 01.09.2016 16 
3000 01.10.2016 .
3000 01.11.2016 .
3000 01.12.2016 .
;

proc sort data = have;
	by Customer;
run;

data want;
	set have;
	by Customer;
	if first.Customer;
run;

 

2) If you only want the very first observation for each customer, how can you determine whether the previous observation for that customer is missing?

Super User
Posts: 6,939

Re: showing first values for either a new variable or previous missing values

Use by processing and the lag() function:

data have;
infile cards dlm='09'x truncover;
input customer date :ddmmyy10. rating;
format date ddmmyyp10.;
cards;
1000	01.01.2016	5
1000	01.02.2016	5
1000	01.03.2016	5
1000	01.04.2016	6
1000	01.05.2016	7
1000	01.06.2016	 
1000	01.07.2016	 
1000	01.08.2016	9
1000	01.09.2016	9
1000	01.10.2016	10
1000	01.11.2016	10
1000	01.12.2016	10
2000	01.01.2016	4
2000	01.02.2016	4
2000	01.03.2016	4
2000	01.04.2016	4
2000	01.05.2016	5
2000	01.06.2016	5
2000	01.07.2016	 
2000	01.08.2016	 
2000	01.09.2016	 
2000	01.10.2016	5
2000	01.11.2016	5
2000	01.12.2016	 
3000	01.01.2016	 
3000	01.02.2016	 
3000	01.03.2016	 
3000	01.04.2016	15
3000	01.05.2016	15
3000	01.06.2016	15
3000	01.07.2016	16
3000	01.08.2016	16
3000	01.09.2016	16
3000	01.10.2016	 
3000	01.11.2016	 
3000	01.12.2016
;
run;

data want;
set have;
by customer date; /* by date to ensure that data is properly ordered */
retain flag;
oldrating = lag(rating);
if first.customer then flag = 1;
if flag and rating ne .
then do;
  flag = 0;
  output;
end;
else if not first.customer and rating ne . and oldrating = . then output;
drop flag oldrating;
run;
---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
New Contributor
Posts: 4

Re: showing first values for either a new variable or previous missing values

Dear KurtBremser,

it appears to be working. thank you very much.

I have a follow-up question if you dont mind.
I would like to do the same butz this I want to have the last observation of the variable "rating", i.e. the last observation for each customer or the last observation before a subsequent missing rating.
i tried to replace the "first" with "last. but that does not work.
It appears I do not quite understand the code.

if you have an answer to my follow up question I would rally appreciate for it.
Anyways thank you the working solution provided:
cheers

Joey

Solution
‎06-27-2017 12:23 PM
Super User
Posts: 6,939

Re: showing first values for either a new variable or previous missing values

Try this:

data want;
set have;
by customer date; /* by date to ensure that data is properly ordered */
retain flag savedate saverating;
oldrating = lag(rating);
olddate = lag(date);
if oldrating ne . and rating = . and not last.customer
then do;
  date = olddate;
  rating = oldrating;
  output;
  flag = 0;
end;
else if rating ne .
then do;
  savedate = date;
  saverating = rating;
  flag = 1;
end;
if last.customer
then do;
  if rating = .
  then do;
    if flag
    then do;
      date = savedate;
      rating = saverating;
      output;
    end;
  end;
  else output;
end;
drop flag olddate oldrating savedate saverating;
run;

The programming is a little complicated as I wanted to avoid multiple outputs with the same data; one could make the code a little simpler and solve that by doing a proc sort noduprec in the end.

 

Still, it should now give you enough to chew on so you can grok the function of the first. and last. autoamtic variables.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Respected Advisor
Posts: 3,893

Re: showing first values for either a new variable or previous missing values

@JoeyE

The following should work

data want;
  set have;
  by Customer date;
  n=_n_;
  if 
    ( first.customer and not missing(rating) ) 
      or 
    ( not first.customer and missing(lag(rating)) and not missing(rating) )
    ;
run;

N.B: Only use the LAG() function in an IF condition if you fully understand how LAG() works. Else better use it unconditionally, assign the value to a variable and then use this variable in an IF condition.

 

Valued Guide
Posts: 765

Re: showing first values for either a new variable or previous missing values

[ Edited ]

Hi, think this works for both first and last values (easy tp modify if you want first and last in one data set, see after output) ...


data have;
set have;
new + missing(rating);
run;

 

data first_nonmiss last_nonmiss;
set have (where=(rating ne .));
by customer new;
if first.new then output first_nonmiss;

else

if last.new then output last_nonmiss;

drop new;

run;

DATA SET: first_nonmiss

customer date       rating

1000     01.01.2016   5
1000     01.08.2016   9
2000     01.01.2016   4
2000     01.10.2016   5
3000     01.04.2016  15

 

DATA SET: last_nonmiss

customer date        rating

1000     01.05.2016    7
1000     01.12.2016   10
2000     01.06.2016    5
2000     01.11.2016    5
3000     01.09.2016   16

 

Or ...

 

data first_last;
set have (where=(rating ne .));
by customer new;
if first.new or last.new;

* if you want first and last indicators;
first = first.new;
last = last.new;
drop new;
run;

 

 

 

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 6 replies
  • 104 views
  • 0 likes
  • 5 in conversation