Hi, would anyone have any advice on using first and last variable?
I have a dataset where I want to find the first date and last date of the ID variable but my ID variable have different values for the section variable. I want the result to show the first and last date by ID and Section.
Any suggestions would be greatly appreciated.
data Have;
input ID $ name $ Date $ Section $;
datalines;
6567486 Bravo 13Oct2022 Math
6567486 Bravo 14Oct2022 Math
6567486 Bravo 15Oct2022 Math
6567486 Bravo 16Oct2022 Math
6567486 Bravo 17Oct2022 Math
6567486 Bravo 18Oct2022 Math
6567486 Bravo 20Oct2022 Math
6567486 Bravo 21Oct2022 Math
6567486 Bravo 22Oct2022 Math
6567486 Bravo 23Oct2022 Math
6567486 Bravo 24Oct2022 Math
6567486 Bravo 25Oct2022 Math
6567486 Bravo 26Oct2022 Math
6567486 Bravo 27Oct2022 Math
6567486 Bravo 28Oct2022 Math
6567486 Bravo 29Oct2022 Math
6606109 Charlie 1Oct2022 Math
6606109 Charlie 2Oct2022 Math
6606109 Charlie 3Oct2022 Science
6606109 Charlie 4Oct2022 Science
6606109 Charlie 5Oct2022 Science
6621246 Sierra 1Oct2022 Science
6621246 Sierra 2Oct2022 Science
6621246 Sierra 3Oct2022 Science
6621246 Sierra 4Oct2022 Science
6621246 Sierra 5Oct2022 Science
6621246 Sierra 6Oct2022 Science
6621246 Sierra 7Oct2022 Science
6621246 Sierra 8Oct2022 Science
6621246 Sierra 9Oct2022 Science
6621246 Sierra 10Oct2022 Science
6621246 Sierra 11Oct2022 Science
6621246 Sierra 12Oct2022 Science
6621246 Sierra 13Oct2022 Science
6621246 Sierra 14Oct2022 Science
6621246 Sierra 15Oct2022 Science
6621246 Sierra 16Oct2022 Science
6621246 Sierra 17Oct2022 Science
6621246 Sierra 18Oct2022 Science
6621246 Sierra 19Oct2022 Science
6621246 Sierra 20Oct2022 Science
6621246 Sierra 21Oct2022 Science
6621246 Sierra 22Oct2022 Science
6621246 Sierra 23Oct2022 Science
6621246 Sierra 24Oct2022 Science
6621246 Sierra 25Oct2022 Science
6621246 Sierra 26Oct2022 Science
6621246 Sierra 27Oct2022 Science
6621246 Sierra 28Oct2022 Science
6621246 Sierra 29Oct2022 Science
6621246 Sierra 30Oct2022 Science
6621246 Sierra 31Oct2022 Science
6651967 Echo 1Oct2022 Science
6651967 Echo 2Oct2022 Science
6651967 Echo 3Oct2022 Science
6651967 Echo 4Oct2022 Science
6651967 Echo 5Oct2022 Science
6651967 Echo 6Oct2022 Science
6651967 Echo 7Oct2022 Science
6651967 Echo 8Oct2022 Science
6651967 Echo 9Oct2022 Science
6651967 Echo 10Oct2022 Science
6651967 Echo 11Oct2022 Science
6651967 Echo 12Oct2022 English
6651967 Echo 13Oct2022 English
6651967 Echo 14Oct2022 English
6651967 Echo 15Oct2022 English
6651967 Echo 16Oct2022 English
6651967 Echo 17Oct2022 English
6651967 Echo 18Oct2022 English
6651967 Echo 19Oct2022 English
6651967 Echo 20Oct2022 English
6651967 Echo 21Oct2022 English
6651967 Echo 22Oct2022 English
6651967 Echo 23Oct2022 English
;
run;
proc sort data=Have; by section ID date; run;
DATA Have2;
set Have;
by Section ID;
if first.ID then First_day=date;
First_day+.;
if last.ID then Last_day=date;
format first_day last_day date10.;
if last.ID;
run;
Hi @Scooby3g
If you want to handle gaps between dates, it is simpler to count the days instead of subtracting. Here are two examples.
In the first example the days are counted within each by-group.
The last example utilizes a second set-statement to look ahead and make separate counts for each group of concecutive days within a given by-group.
* Count the days in a given by-group;
data want (drop = date);
set have;
by Section ID;
retain Start Days;
format Start End date9.;
if first.ID then do;
Start = date;
Days = 0;
end;
Days + 1;
if last.ID then do;
end = date;
output;
end;
run;
* Count the days in a given by group,
but keep track of gaps and make separate counts for each set of concecutive days;
data want (drop = date NextDate);
set have end=eof;
by Section ID;
retain Start Days;
format Start End date9.;
if not eof then set have (firstobs=2 keep=date rename=(date=NextDate));
if first.ID then do;
Start = date;
Days = 0;
end;
Days + 1;
if not last.ID and date < NextDate-1 then do;
End = date;
output;
Start = NextDate;
Days = 0;
end;
if last.ID then do;
end = date;
output;
end;
run;
My recommendation on learning to use complex combinations of First and Last variables to try some, or all, of your data and create actual variables with the first and last values to see how things change.
Example:
DATA Have2; set Have; by Section ID; fsec = First.section; lsec = last.section; fid = first.id; lid = last.id; run;
IF, that can be a big if, I understand what this means: " I want the result to show the first and last date by ID and Section. "
Then that might be :
If first.section then somevar=date; If last.section then othervar=date;
One reason I say 'might' is because I do not know what you expect. Look at the first output I suggested with the first and last values to see which combination(s) mark the records you want.
Do you expect to have the values attached to each record of the by group?
Just those values in the output set?
Make sure that DATE is an actual DATE value and not the character string your example is creating.
Otherwise the data will not sort properly and your program will get errors about trying to attach a numeric format to character variable.
Also the DATE format does not really work with a width of 10. You can use 9 and get strings like in your data lines. Or 11 and get hyphens. (or 7 and not be able to tell what century the dates are in).
data want;
set have;
by Section ID;
if first.ID then First_day=date;
retain first_day ;
if last.ID ;
Last_day=date;
format first_day last_day date9.;
drop date;
run;
Thank you everyone for your suggestions!! These suggestion was greatly appreciated!!
I used the data step Tom suggested but I have one more question. Could I trouble you for for one more suggestion. Is it possible to account for the date gap between each date for? For example in obs 18 and 19 there is a gap between the two dates. Is it possible to account for 19Oct2022 to be missing? I am planning to use have the last date minus the first date the data step later to count the number of days for each section.
6567486 Bravo 18Oct2022 Math
6567486 Bravo 20Oct2022 Math
data Have;
length char_date $10;
infile DATALINES;
/*input ID $ name $ char_Date: $20. fmt : $20. Section ;*/
input ID $ name $ char_Date $ Section $;
date = input(char_date,date9.);
format date date9.;
datalines;
6567486 Bravo 13Oct2022 Math
6567486 Bravo 14Oct2022 Math
6567486 Bravo 15Oct2022 Math
6567486 Bravo 16Oct2022 Math
6567486 Bravo 17Oct2022 Math
6567486 Bravo 18Oct2022 Math
6567486 Bravo 20Oct2022 Math
6567486 Bravo 21Oct2022 Math
6567486 Bravo 22Oct2022 Math
6567486 Bravo 23Oct2022 Math
6567486 Bravo 24Oct2022 Math
6567486 Bravo 25Oct2022 Math
6567486 Bravo 26Oct2022 Math
6567486 Bravo 27Oct2022 Math
6567486 Bravo 28Oct2022 Math
6567486 Bravo 29Oct2022 Math
6606109 Charlie 1Oct2022 Math
6606109 Charlie 2Oct2022 Math
6606109 Charlie 3Oct2022 Science
6606109 Charlie 4Oct2022 Science
6606109 Charlie 5Oct2022 Science
6621246 Sierra 1Oct2022 Science
6621246 Sierra 2Oct2022 Science
6621246 Sierra 3Oct2022 Science
6621246 Sierra 4Oct2022 Science
6621246 Sierra 5Oct2022 Science
6621246 Sierra 6Oct2022 Science
6621246 Sierra 7Oct2022 Science
6621246 Sierra 8Oct2022 Science
6621246 Sierra 9Oct2022 Science
6621246 Sierra 10Oct2022 Science
6621246 Sierra 11Oct2022 Science
6621246 Sierra 12Oct2022 Science
6621246 Sierra 13Oct2022 Science
6621246 Sierra 14Oct2022 Science
6621246 Sierra 15Oct2022 Science
6621246 Sierra 16Oct2022 Science
6621246 Sierra 17Oct2022 Science
6621246 Sierra 18Oct2022 Science
6621246 Sierra 19Oct2022 Science
6621246 Sierra 20Oct2022 Science
6621246 Sierra 21Oct2022 Science
6621246 Sierra 22Oct2022 Science
6621246 Sierra 23Oct2022 Science
6621246 Sierra 24Oct2022 Science
6621246 Sierra 25Oct2022 Science
6621246 Sierra 26Oct2022 Science
6621246 Sierra 27Oct2022 Science
6621246 Sierra 28Oct2022 Science
6621246 Sierra 29Oct2022 Science
6621246 Sierra 30Oct2022 Science
6621246 Sierra 31Oct2022 Science
6651967 Echo 1Oct2022 Science
6651967 Echo 2Oct2022 Science
6651967 Echo 3Oct2022 Science
6651967 Echo 4Oct2022 Science
6651967 Echo 5Oct2022 Science
6651967 Echo 6Oct2022 Science
6651967 Echo 7Oct2022 Science
6651967 Echo 8Oct2022 Science
6651967 Echo 9Oct2022 Science
6651967 Echo 10Oct2022 Science
6651967 Echo 11Oct2022 Science
6651967 Echo 12Oct2022 English
6651967 Echo 13Oct2022 English
6651967 Echo 14Oct2022 English
6651967 Echo 15Oct2022 English
6651967 Echo 16Oct2022 English
6651967 Echo 17Oct2022 English
6651967 Echo 18Oct2022 English
6651967 Echo 19Oct2022 English
6651967 Echo 20Oct2022 English
6651967 Echo 21Oct2022 English
6651967 Echo 22Oct2022 English
6651967 Echo 23Oct2022 English
;
run;
proc sort data=Have;
by section ID date;
run;
data want;
set have;
by Section ID;
if first.ID then First_day=date;
retain first_day ;
if last.ID ;
Last_day=date;
format first_day last_day date9.;
drop date;
run;
data want2;
set want;
days= (last_day - first_day)+1;
run;
You can also use SQL to get those dates:
proc sql;
create table want as
select
id,
section,
min(date) as first_day format=yymmdd10.,
max(date) as last_day format=yymmdd10.
from have
group by id, section
;
quit;
if yor date is in fact a SAS date value and not just a string looking like a date.
I see that this is a likely task for learning a little about first. and last. processing.
But it's at least as good for learning some of the capabilities of PROC SUMMARY:
data Have;
input ID $ name $ Date :date9. Section $;
format date date9. ;
datalines;
6567486 Bravo 13Oct2022 Math
6567486 Bravo 14Oct2022 Math
6567486 Bravo 15Oct2022 Math
6567486 Bravo 16Oct2022 Math
6567486 Bravo 17Oct2022 Math
6567486 Bravo 18Oct2022 Math
6567486 Bravo 20Oct2022 Math
6567486 Bravo 21Oct2022 Math
6567486 Bravo 22Oct2022 Math
6567486 Bravo 23Oct2022 Math
6567486 Bravo 24Oct2022 Math
6567486 Bravo 25Oct2022 Math
6567486 Bravo 26Oct2022 Math
6567486 Bravo 27Oct2022 Math
6567486 Bravo 28Oct2022 Math
6567486 Bravo 29Oct2022 Math
6606109 Charlie 1Oct2022 Math
6606109 Charlie 2Oct2022 Math
6606109 Charlie 3Oct2022 Science
6606109 Charlie 4Oct2022 Science
6606109 Charlie 5Oct2022 Science
6621246 Sierra 1Oct2022 Science
6621246 Sierra 2Oct2022 Science
6621246 Sierra 3Oct2022 Science
6621246 Sierra 4Oct2022 Science
6621246 Sierra 5Oct2022 Science
6621246 Sierra 6Oct2022 Science
6621246 Sierra 7Oct2022 Science
6621246 Sierra 8Oct2022 Science
6621246 Sierra 9Oct2022 Science
6621246 Sierra 10Oct2022 Science
6621246 Sierra 11Oct2022 Science
6621246 Sierra 12Oct2022 Science
6621246 Sierra 13Oct2022 Science
6621246 Sierra 14Oct2022 Science
6621246 Sierra 15Oct2022 Science
6621246 Sierra 16Oct2022 Science
6621246 Sierra 17Oct2022 Science
6621246 Sierra 18Oct2022 Science
6621246 Sierra 19Oct2022 Science
6621246 Sierra 20Oct2022 Science
6621246 Sierra 21Oct2022 Science
6621246 Sierra 22Oct2022 Science
6621246 Sierra 23Oct2022 Science
6621246 Sierra 24Oct2022 Science
6621246 Sierra 25Oct2022 Science
6621246 Sierra 26Oct2022 Science
6621246 Sierra 27Oct2022 Science
6621246 Sierra 28Oct2022 Science
6621246 Sierra 29Oct2022 Science
6621246 Sierra 30Oct2022 Science
6621246 Sierra 31Oct2022 Science
6651967 Echo 1Oct2022 Science
6651967 Echo 2Oct2022 Science
6651967 Echo 3Oct2022 Science
6651967 Echo 4Oct2022 Science
6651967 Echo 5Oct2022 Science
6651967 Echo 6Oct2022 Science
6651967 Echo 7Oct2022 Science
6651967 Echo 8Oct2022 Science
6651967 Echo 9Oct2022 Science
6651967 Echo 10Oct2022 Science
6651967 Echo 11Oct2022 Science
6651967 Echo 12Oct2022 English
6651967 Echo 13Oct2022 English
6651967 Echo 14Oct2022 English
6651967 Echo 15Oct2022 English
6651967 Echo 16Oct2022 English
6651967 Echo 17Oct2022 English
6651967 Echo 18Oct2022 English
6651967 Echo 19Oct2022 English
6651967 Echo 20Oct2022 English
6651967 Echo 21Oct2022 English
6651967 Echo 22Oct2022 English
6651967 Echo 23Oct2022 English
;
run;
proc summary data=have noprint nway ;
class id section;
var date ;
output out=want (drop=_:) min=first_day max=last_day maxid(date(name))=last_name;
run;
Hi @Scooby3g
If you want to handle gaps between dates, it is simpler to count the days instead of subtracting. Here are two examples.
In the first example the days are counted within each by-group.
The last example utilizes a second set-statement to look ahead and make separate counts for each group of concecutive days within a given by-group.
* Count the days in a given by-group;
data want (drop = date);
set have;
by Section ID;
retain Start Days;
format Start End date9.;
if first.ID then do;
Start = date;
Days = 0;
end;
Days + 1;
if last.ID then do;
end = date;
output;
end;
run;
* Count the days in a given by group,
but keep track of gaps and make separate counts for each set of concecutive days;
data want (drop = date NextDate);
set have end=eof;
by Section ID;
retain Start Days;
format Start End date9.;
if not eof then set have (firstobs=2 keep=date rename=(date=NextDate));
if first.ID then do;
Start = date;
Days = 0;
end;
Days + 1;
if not last.ID and date < NextDate-1 then do;
End = date;
output;
Start = NextDate;
Days = 0;
end;
if last.ID then do;
end = date;
output;
end;
run;
Hi @ErikLund_Jensen ,
Would it be possible to do both in one data step? Count the number of dates by section and ID and also find the first date and last date but also account for the gaps between the dates?
what I am trying to do is find the first date and last date (like the data step @Tom had suggested) and also count how many dates. For example, in obs one how many dates there were by section and ID.
It is trivial to count observations.
To count days you might want to test FIRST.DATE so that duplicate dates don't count twice.
data want;
set have;
by id name date;
if first.name then do;
first_date=date;
days=0;
end;
retain first_date days;
days+first.date;
if last.name then do;
last_date=date;
total_days=last_date-first_date+1;
output;
end;
keep id name first_date last_date days total_days;
format first_date last_date date9.;
run;
first_ total_ OBS ID name date days last_date days 1 6567486 Bravo 13OCT2022 16 29OCT2022 17 2 6606109 Charlie 01OCT2022 5 05OCT2022 5 3 6621246 Sierra 01OCT2022 31 31OCT2022 31 4 6651967 Echo 01OCT2022 23 23OCT2022 23
@Tom Thank you so much!!! this is exactly want I have been trying to do.
Hi @Scooby3g
The first of the two steps I sent you does exactly that - find first date and last date per ID / Section and count the actual days so gaps taken care of.
The two steps does the same, but the last is slightly more complicated, because it groups per ID / Section / group of continous dates. so dates separated by a gap are treated as separate groups with first/last data and count.
Thank you so much!!! The second one is perfect!!!
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.