BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Scooby3g
Obsidian | Level 7

Hi, would anyone have any advice on using first and last variable?

 

I have a dataset where I want to find the first date and last date of the ID variable but my ID variable have different values for the section variable. I want the result to show the first and last date by ID and Section. 

 

Any suggestions would be greatly appreciated. 

 

data Have;
input ID $ name $ Date $ Section $;
datalines; 
6567486	Bravo	13Oct2022	Math
6567486	Bravo	14Oct2022	Math
6567486	Bravo	15Oct2022	Math
6567486	Bravo	16Oct2022	Math
6567486	Bravo	17Oct2022	Math
6567486	Bravo	18Oct2022	Math
6567486	Bravo	20Oct2022	Math
6567486	Bravo	21Oct2022	Math
6567486	Bravo	22Oct2022	Math
6567486	Bravo	23Oct2022	Math
6567486	Bravo	24Oct2022	Math
6567486	Bravo	25Oct2022	Math
6567486	Bravo	26Oct2022	Math
6567486	Bravo	27Oct2022	Math
6567486	Bravo	28Oct2022	Math
6567486	Bravo	29Oct2022	Math
6606109	Charlie	1Oct2022	Math
6606109	Charlie	2Oct2022	Math
6606109	Charlie	3Oct2022	Science
6606109	Charlie	4Oct2022	Science
6606109	Charlie	5Oct2022	Science
6621246	Sierra	1Oct2022	Science
6621246	Sierra	2Oct2022	Science
6621246	Sierra	3Oct2022	Science
6621246	Sierra	4Oct2022	Science
6621246	Sierra	5Oct2022	Science
6621246	Sierra	6Oct2022	Science
6621246	Sierra	7Oct2022	Science
6621246	Sierra	8Oct2022	Science
6621246	Sierra	9Oct2022	Science
6621246	Sierra	10Oct2022	Science
6621246	Sierra	11Oct2022	Science
6621246	Sierra	12Oct2022	Science
6621246	Sierra	13Oct2022	Science
6621246	Sierra	14Oct2022	Science
6621246	Sierra	15Oct2022	Science
6621246	Sierra	16Oct2022	Science
6621246	Sierra	17Oct2022	Science
6621246	Sierra	18Oct2022	Science
6621246	Sierra	19Oct2022	Science
6621246	Sierra	20Oct2022	Science
6621246	Sierra	21Oct2022	Science
6621246	Sierra	22Oct2022	Science
6621246	Sierra	23Oct2022	Science
6621246	Sierra	24Oct2022	Science
6621246	Sierra	25Oct2022	Science
6621246	Sierra	26Oct2022	Science
6621246	Sierra	27Oct2022	Science
6621246	Sierra	28Oct2022	Science
6621246	Sierra	29Oct2022	Science
6621246	Sierra	30Oct2022	Science
6621246	Sierra	31Oct2022	Science
6651967	Echo	1Oct2022	Science
6651967	Echo	2Oct2022	Science
6651967	Echo	3Oct2022	Science
6651967	Echo	4Oct2022	Science
6651967	Echo	5Oct2022	Science
6651967	Echo	6Oct2022	Science
6651967	Echo	7Oct2022	Science
6651967	Echo	8Oct2022	Science
6651967	Echo	9Oct2022	Science
6651967	Echo	10Oct2022	Science
6651967	Echo	11Oct2022	Science
6651967	Echo	12Oct2022	English
6651967	Echo	13Oct2022	English
6651967	Echo	14Oct2022	English
6651967	Echo	15Oct2022	English
6651967	Echo	16Oct2022	English
6651967	Echo	17Oct2022	English
6651967	Echo	18Oct2022	English
6651967	Echo	19Oct2022	English
6651967	Echo	20Oct2022	English
6651967	Echo	21Oct2022	English
6651967	Echo	22Oct2022	English
6651967	Echo	23Oct2022	English
;
run;


proc sort data=Have; by  section ID date; run;

 DATA Have2;
set Have;
by Section ID;
if first.ID then First_day=date;
 First_day+.;
if last.ID  then Last_day=date;
format first_day last_day date10.;
if last.ID;
run;
1 ACCEPTED SOLUTION

Accepted Solutions
ErikLund_Jensen
Rhodochrosite | Level 12

Hi @Scooby3g 

If you want to handle gaps between dates, it is simpler to count the days instead of subtracting. Here are two examples.

In the first example the days are counted within each by-group.

The last example utilizes a second set-statement to look ahead and make separate counts for each group of concecutive days within a given by-group.

 

* Count the days in a given by-group;
data want (drop = date);
  set have;
  by Section ID;
  retain Start Days;
  format Start End date9.;

  if first.ID then do;
    Start = date;
    Days = 0;
  end;

  Days + 1;
  if last.ID then do;
    end = date;
    output;
  end;
run;

* Count the days in a given by group, 
    but keep track of gaps and make separate counts for each set of concecutive days;
data want (drop = date NextDate);
  set have end=eof;
  by Section ID;
  retain Start Days;
  format Start End date9.;

  if not eof then set have (firstobs=2 keep=date rename=(date=NextDate));

  if first.ID then do;
    Start = date;
    Days = 0;
  end;

  Days + 1;

  if not last.ID and date < NextDate-1 then do;
    End = date;
    output; 
    Start = NextDate;
    Days = 0;
  end;

  if last.ID then do;
    end = date;
    output;
  end;
run;

View solution in original post

11 REPLIES 11
ballardw
Super User

My recommendation on learning to use complex combinations of First and Last variables to try some, or all, of your data and create actual variables with the first and last values to see how things change.

 

Example:

DATA Have2;
   set Have;
   by Section ID;
   fsec = First.section;
   lsec = last.section;
   fid  = first.id;
   lid  = last.id;
run;

IF, that can be a big if, I understand what this means: " I want the result to show the first and last date by ID and Section. "

 

Then that might be :

If first.section then somevar=date;
If last.section then othervar=date;

One reason I say 'might' is because I do not know what you expect. Look at the first output I suggested with the first and last values to see which combination(s) mark the records you want.

 

 

 

 

Do you expect to have the values attached to each record of the by group?

Just those values in the output set?

Tom
Super User Tom
Super User

Make sure that DATE is an actual DATE value and not the character string your example is creating.

Otherwise the data will not sort properly and your program will get errors about trying to attach a numeric format to character variable.

 

Also the DATE format does not really work with a width of 10.  You can use 9 and get strings like in your data lines. Or 11 and get hyphens. (or 7 and not be able to tell what century the dates are in).

data want; 
  set have;
  by Section ID;
  if first.ID then First_day=date;
  retain first_day ;
  if last.ID ;
  Last_day=date;
  format first_day last_day date9.;
  drop date;
run;

 

Scooby3g
Obsidian | Level 7

Thank you everyone for your suggestions!! These suggestion was greatly appreciated!! 

 

I used the data step Tom suggested but I have one more question. Could I trouble you for for one more suggestion. Is it possible to account for the date gap between each date for? For example in obs 18 and 19 there is a gap between the two dates. Is it possible to account for 19Oct2022 to be missing? I am planning to use have the last date minus the first date the data step later to count the number of days for each section.  

6567486 Bravo 18Oct2022 Math 6567486 Bravo 20Oct2022 Math

 

data Have;
length char_date $10;
infile DATALINES;
/*input ID $ name $ char_Date:  $20. fmt : $20.  Section ;*/
input ID $ name $ char_Date $  Section $;
date = input(char_date,date9.);
format date date9.;
datalines; 
6567486	Bravo	13Oct2022	Math
6567486	Bravo	14Oct2022	Math
6567486	Bravo	15Oct2022	Math
6567486	Bravo	16Oct2022	Math
6567486	Bravo	17Oct2022	Math
6567486	Bravo	18Oct2022	Math
6567486	Bravo	20Oct2022	Math
6567486	Bravo	21Oct2022	Math
6567486	Bravo	22Oct2022	Math
6567486	Bravo	23Oct2022	Math
6567486	Bravo	24Oct2022	Math
6567486	Bravo	25Oct2022	Math
6567486	Bravo	26Oct2022	Math
6567486	Bravo	27Oct2022	Math
6567486	Bravo	28Oct2022	Math
6567486	Bravo	29Oct2022	Math
6606109	Charlie	1Oct2022	Math
6606109	Charlie	2Oct2022	Math
6606109	Charlie	3Oct2022	Science
6606109	Charlie	4Oct2022	Science
6606109	Charlie	5Oct2022	Science
6621246	Sierra	1Oct2022	Science
6621246	Sierra	2Oct2022	Science
6621246	Sierra	3Oct2022	Science
6621246	Sierra	4Oct2022	Science
6621246	Sierra	5Oct2022	Science
6621246	Sierra	6Oct2022	Science
6621246	Sierra	7Oct2022	Science
6621246	Sierra	8Oct2022	Science
6621246	Sierra	9Oct2022	Science
6621246	Sierra	10Oct2022	Science
6621246	Sierra	11Oct2022	Science
6621246	Sierra	12Oct2022	Science
6621246	Sierra	13Oct2022	Science
6621246	Sierra	14Oct2022	Science
6621246	Sierra	15Oct2022	Science
6621246	Sierra	16Oct2022	Science
6621246	Sierra	17Oct2022	Science
6621246	Sierra	18Oct2022	Science
6621246	Sierra	19Oct2022	Science
6621246	Sierra	20Oct2022	Science
6621246	Sierra	21Oct2022	Science
6621246	Sierra	22Oct2022	Science
6621246	Sierra	23Oct2022	Science
6621246	Sierra	24Oct2022	Science
6621246	Sierra	25Oct2022	Science
6621246	Sierra	26Oct2022	Science
6621246	Sierra	27Oct2022	Science
6621246	Sierra	28Oct2022	Science
6621246	Sierra	29Oct2022	Science
6621246	Sierra	30Oct2022	Science
6621246	Sierra	31Oct2022	Science
6651967	Echo	1Oct2022	Science
6651967	Echo	2Oct2022	Science
6651967	Echo	3Oct2022	Science
6651967	Echo	4Oct2022	Science
6651967	Echo	5Oct2022	Science
6651967	Echo	6Oct2022	Science
6651967	Echo	7Oct2022	Science
6651967	Echo	8Oct2022	Science
6651967	Echo	9Oct2022	Science
6651967	Echo	10Oct2022	Science
6651967	Echo	11Oct2022	Science
6651967	Echo	12Oct2022	English
6651967	Echo	13Oct2022	English
6651967	Echo	14Oct2022	English
6651967	Echo	15Oct2022	English
6651967	Echo	16Oct2022	English
6651967	Echo	17Oct2022	English
6651967	Echo	18Oct2022	English
6651967	Echo	19Oct2022	English
6651967	Echo	20Oct2022	English
6651967	Echo	21Oct2022	English
6651967	Echo	22Oct2022	English
6651967	Echo	23Oct2022	English
;
run;

proc sort data=Have; 
        by  section ID date;
run;


data want; 
  set have;
  by Section ID;
  if first.ID then First_day=date;
  retain first_day ;
  if last.ID ;
  Last_day=date;
  format first_day last_day date9.;
  drop date;
run;

data want2;
   set want;
   days= (last_day - first_day)+1;
run;

 

Kurt_Bremser
Super User

You can also use SQL to get those dates:

proc sql;
create table want as
  select
    id,
    section,
    min(date) as first_day format=yymmdd10.,
    max(date) as last_day format=yymmdd10.
  from have
  group by id, section
;
quit;

if yor date is in fact a SAS date value and not just a string looking like a date.

mkeintz
PROC Star

I see that this is a likely task for learning a little about first. and last. processing.

 

But it's at least as good for learning some of the capabilities of PROC SUMMARY:  

 

data Have;
input ID $ name $ Date :date9. Section $;
format date date9. ;
datalines; 
6567486	Bravo	13Oct2022	Math
6567486	Bravo	14Oct2022	Math
6567486	Bravo	15Oct2022	Math
6567486	Bravo	16Oct2022	Math
6567486	Bravo	17Oct2022	Math
6567486	Bravo	18Oct2022	Math
6567486	Bravo	20Oct2022	Math
6567486	Bravo	21Oct2022	Math
6567486	Bravo	22Oct2022	Math
6567486	Bravo	23Oct2022	Math
6567486	Bravo	24Oct2022	Math
6567486	Bravo	25Oct2022	Math
6567486	Bravo	26Oct2022	Math
6567486	Bravo	27Oct2022	Math
6567486	Bravo	28Oct2022	Math
6567486	Bravo	29Oct2022	Math
6606109	Charlie	1Oct2022	Math
6606109	Charlie	2Oct2022	Math
6606109	Charlie	3Oct2022	Science
6606109	Charlie	4Oct2022	Science
6606109	Charlie	5Oct2022	Science
6621246	Sierra	1Oct2022	Science
6621246	Sierra	2Oct2022	Science
6621246	Sierra	3Oct2022	Science
6621246	Sierra	4Oct2022	Science
6621246	Sierra	5Oct2022	Science
6621246	Sierra	6Oct2022	Science
6621246	Sierra	7Oct2022	Science
6621246	Sierra	8Oct2022	Science
6621246	Sierra	9Oct2022	Science
6621246	Sierra	10Oct2022	Science
6621246	Sierra	11Oct2022	Science
6621246	Sierra	12Oct2022	Science
6621246	Sierra	13Oct2022	Science
6621246	Sierra	14Oct2022	Science
6621246	Sierra	15Oct2022	Science
6621246	Sierra	16Oct2022	Science
6621246	Sierra	17Oct2022	Science
6621246	Sierra	18Oct2022	Science
6621246	Sierra	19Oct2022	Science
6621246	Sierra	20Oct2022	Science
6621246	Sierra	21Oct2022	Science
6621246	Sierra	22Oct2022	Science
6621246	Sierra	23Oct2022	Science
6621246	Sierra	24Oct2022	Science
6621246	Sierra	25Oct2022	Science
6621246	Sierra	26Oct2022	Science
6621246	Sierra	27Oct2022	Science
6621246	Sierra	28Oct2022	Science
6621246	Sierra	29Oct2022	Science
6621246	Sierra	30Oct2022	Science
6621246	Sierra	31Oct2022	Science
6651967	Echo	1Oct2022	Science
6651967	Echo	2Oct2022	Science
6651967	Echo	3Oct2022	Science
6651967	Echo	4Oct2022	Science
6651967	Echo	5Oct2022	Science
6651967	Echo	6Oct2022	Science
6651967	Echo	7Oct2022	Science
6651967	Echo	8Oct2022	Science
6651967	Echo	9Oct2022	Science
6651967	Echo	10Oct2022	Science
6651967	Echo	11Oct2022	Science
6651967	Echo	12Oct2022	English
6651967	Echo	13Oct2022	English
6651967	Echo	14Oct2022	English
6651967	Echo	15Oct2022	English
6651967	Echo	16Oct2022	English
6651967	Echo	17Oct2022	English
6651967	Echo	18Oct2022	English
6651967	Echo	19Oct2022	English
6651967	Echo	20Oct2022	English
6651967	Echo	21Oct2022	English
6651967	Echo	22Oct2022	English
6651967	Echo	23Oct2022	English
;
run;


proc summary data=have noprint nway ;
  class id section;
  var date ;
  output out=want (drop=_:) min=first_day max=last_day   maxid(date(name))=last_name;
run;
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
ErikLund_Jensen
Rhodochrosite | Level 12

Hi @Scooby3g 

If you want to handle gaps between dates, it is simpler to count the days instead of subtracting. Here are two examples.

In the first example the days are counted within each by-group.

The last example utilizes a second set-statement to look ahead and make separate counts for each group of concecutive days within a given by-group.

 

* Count the days in a given by-group;
data want (drop = date);
  set have;
  by Section ID;
  retain Start Days;
  format Start End date9.;

  if first.ID then do;
    Start = date;
    Days = 0;
  end;

  Days + 1;
  if last.ID then do;
    end = date;
    output;
  end;
run;

* Count the days in a given by group, 
    but keep track of gaps and make separate counts for each set of concecutive days;
data want (drop = date NextDate);
  set have end=eof;
  by Section ID;
  retain Start Days;
  format Start End date9.;

  if not eof then set have (firstobs=2 keep=date rename=(date=NextDate));

  if first.ID then do;
    Start = date;
    Days = 0;
  end;

  Days + 1;

  if not last.ID and date < NextDate-1 then do;
    End = date;
    output; 
    Start = NextDate;
    Days = 0;
  end;

  if last.ID then do;
    end = date;
    output;
  end;
run;
Scooby3g
Obsidian | Level 7

Hi @ErikLund_Jensen ,

Would it be possible to do both in one data step? Count the number of dates by section and ID and also find the first date and last date but also account for the gaps between the dates?

 

what I am trying to do is find the first date and last date (like the data step  @Tom  had suggested) and also count how many dates. For example, in obs one how many dates there were by section and ID.

 

Scooby3g_1-1669066550126.png

 

Tom
Super User Tom
Super User

It is trivial to count observations.

To count days you might want to test FIRST.DATE so that duplicate dates don't count twice.

data want;
  set have;
  by id name date;
  if first.name then do;
     first_date=date;
     days=0;
  end;
  retain first_date days;
  days+first.date;
  if last.name then do;
    last_date=date;
    total_days=last_date-first_date+1;
    output;
  end;
  keep id name first_date last_date days total_days;
  format first_date last_date date9.;
run;
                                first_                         total_
OBS      ID       name            date    days    last_date     days

 1     6567486    Bravo      13OCT2022     16     29OCT2022      17
 2     6606109    Charlie    01OCT2022      5     05OCT2022       5
 3     6621246    Sierra     01OCT2022     31     31OCT2022      31
 4     6651967    Echo       01OCT2022     23     23OCT2022      23
Scooby3g
Obsidian | Level 7

@Tom Thank you so much!!! this is exactly want I have been trying to do. 

ErikLund_Jensen
Rhodochrosite | Level 12

Hi @Scooby3g 

 

The first of the two steps I sent you does exactly that - find first date and last date per ID / Section and count the actual days so gaps taken care of.

 

The two steps does the same, but the last is slightly more complicated, because it groups per ID / Section / group of continous dates. so dates separated by a gap are treated as separate groups with first/last data and count.

Scooby3g
Obsidian | Level 7

Hi @ErikLund_Jensen 

Thank you so much!!! The second one is perfect!!!

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 11 replies
  • 6146 views
  • 6 likes
  • 6 in conversation