Please be gentle - I'm new - using SAS 9.4 I'm trying to find the first date the Pick_Date is Null. Any assistance is appreciated! Thanks in advance!!
Here is what my data table looks like:
Table Orchard | |||
Apple_Var | Ready_Date | Pick_Date | Change_Date |
Red_Del | 8/24/2024 | 8/24/2024 | 8/24/2024 |
Red_Del | 8/24/2024 | 8/25/2024 | 8/25/2024 |
Red_Del | 8/24/2024 | 8/26/2024 | 8/26/2024 |
Red_Del | 8/24/2024 | 8/27/2027 | 8/27/2024 |
Red_Del | 8/24/2024 | 8/28/2024 | |
Granny | 8/26/2024 | 8/26/2024 | 8/26/2024 |
Granny | 8/26/2024 | 8/27/2024 | 8/27/2024 |
Granny | 8/26/2024 | 8/28/2024 | 8/28/2024 |
Granny | 8/26/2024 | 8/29/2024 |
Here is my desired output:
Table Orchard | |||
Apple_Var | Ready_Date | Pick_Date | Change_Date |
Red_Del | 8/24/2024 | 8/28/2024 | |
Granny | 8/26/2024 | 8/29/2024 |
First in what sense?
First sorted by what variables?
To get what you showed you probably want the data sorted by APPLE_VAR and CHANGE_DATE and just find the first per APPLE_VAR that has missing value of PICK_DATE.
First let's convert your LISTING into an actual DATASET so we have something to work from. Let's display the dates with a format that will not confuse half of our audience.
data HAVE;
input Apple_Var :$10. (Ready_Date Pick_Date Change_Date) (:mmddyy.);
format Ready_Date Pick_Date Change_Date date9.;
cards;
Red_Del 8/24/2024 8/24/2024 8/24/2024
Red_Del 8/24/2024 8/25/2024 8/25/2024
Red_Del 8/24/2024 8/26/2024 8/26/2024
Red_Del 8/24/2024 8/27/2027 8/27/2024
Red_Del 8/24/2024 . 8/28/2024
Granny 8/26/2024 8/26/2024 8/26/2024
Granny 8/26/2024 8/27/2024 8/27/2024
Granny 8/26/2024 8/28/2024 8/28/2024
Granny 8/26/2024 . 8/29/2024
;
Will need to make sure the data is sorted properly so we can use BY statement in our data step.
Note that since your example only has TWO apple varieties you could get it to work without the sort by using the DESCENDING keyword in the BY statement to let the data step know that you have the apple varieties in descending alphabetical order.
proc sort data=HAVE;
by apple_var change_date;
run;
data WANT;
set have;
by apple_var change_date;
where missing(pick_date);
if first.apple_var;
run;
Results
Apple_ Ready_ Change_ Obs Var Date Pick_Date Date 1 Granny 26AUG2024 . 29AUG2024 2 Red_Del 24AUG2024 . 28AUG2024
First in what sense?
First sorted by what variables?
To get what you showed you probably want the data sorted by APPLE_VAR and CHANGE_DATE and just find the first per APPLE_VAR that has missing value of PICK_DATE.
First let's convert your LISTING into an actual DATASET so we have something to work from. Let's display the dates with a format that will not confuse half of our audience.
data HAVE;
input Apple_Var :$10. (Ready_Date Pick_Date Change_Date) (:mmddyy.);
format Ready_Date Pick_Date Change_Date date9.;
cards;
Red_Del 8/24/2024 8/24/2024 8/24/2024
Red_Del 8/24/2024 8/25/2024 8/25/2024
Red_Del 8/24/2024 8/26/2024 8/26/2024
Red_Del 8/24/2024 8/27/2027 8/27/2024
Red_Del 8/24/2024 . 8/28/2024
Granny 8/26/2024 8/26/2024 8/26/2024
Granny 8/26/2024 8/27/2024 8/27/2024
Granny 8/26/2024 8/28/2024 8/28/2024
Granny 8/26/2024 . 8/29/2024
;
Will need to make sure the data is sorted properly so we can use BY statement in our data step.
Note that since your example only has TWO apple varieties you could get it to work without the sort by using the DESCENDING keyword in the BY statement to let the data step know that you have the apple varieties in descending alphabetical order.
proc sort data=HAVE;
by apple_var change_date;
run;
data WANT;
set have;
by apple_var change_date;
where missing(pick_date);
if first.apple_var;
run;
Results
Apple_ Ready_ Change_ Obs Var Date Pick_Date Date 1 Granny 26AUG2024 . 29AUG2024 2 Red_Del 24AUG2024 . 28AUG2024
To do what your subject line literal says you would want something like this:
data want;
set have;
if missing(pick_date)
and (not missing(lag(pick_date)))
and (apple_var = lag(apple_var))
;
run;
The last condition makes sure you don't compare pick_date values across apple varieties.
It also means you cannot select an observation that has missing pick_date that is the first for this apple variety.
And if you have multiple places where PICK_DATE transitioned from being present to being missing for the same apple variety all of those transitions would be selected.
Since you appear to want to process data in the order the data set is currently then that would mean to use a data set in SAS as that processes records in order.
You did not state but seem to imply that you want the result based on the grouping value of a variable named Apple_var.
What you did not say explicitly what to do in the cases of:
1) the only observations for pick_date with missing do not have any "prior" records with none-missing values
2) if there are no missing Pick_date values.
You can use a BY statement to process by the values of a variable. This assumes the data is sorted by that variable. If the data is not actually sorted but is grouped by that variable (your example Red_Del would not come before Granny by sort order) you can use the keyword NOTSORTED to prevent an error. Since you are looking for a "first and only" you need something to help identify if it is the first. That would typically call for flag variable that is RETAINED, keeps the same value from observation to observation unless reset to indicate when to output the needed records.
BEST is to provide example data in the form of a working data step and paste into a text box opened with the </> icon that appears above the message window.
So something like this, creates a data set hopefully similar to yours and then selects the observations.
data have; input Apple_Var $ Ready_Date :mmddyy10. Pick_Date :mmddyy10. Change_Date :mmddyy10. ; format Ready_Date :mmddyy10. Pick_Date :mmddyy10. Change_Date :mmddyy10. ; datalines; Red_Del 8/24/2024 8/24/2024 8/24/2024 Red_Del 8/24/2024 8/25/2024 8/25/2024 Red_Del 8/24/2024 8/26/2024 8/26/2024 Red_Del 8/24/2024 8/27/2027 8/27/2024 Red_Del 8/24/2024 . 8/28/2024 Granny 8/26/2024 8/26/2024 8/26/2024 Granny 8/26/2024 8/27/2024 8/27/2024 Granny 8/26/2024 8/28/2024 8/28/2024 Granny 8/26/2024 . 8/29/2024 ; data want; set have; by apple_var notsorted; retain found; if first.apple_var then found=0 ; if missing(pick_date) and found=0 then do; found=1; output; end; drop found; run;
This doesn't handle the cases where the first observations all have missing pick_dates though.
The BY creates a temporary variable for each variable on the BY statement that indicate first or last value of the group and are 1/0 coded for true/false.
The Output statement makes only explicitly stated output records to write to the output data set.
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.