Solved: Re: Find next variable by group

sasuser_8 · Posted 10-17-2024 11:24 AM

Hi,

I have this code:

DATA SALE;
LENGTH FILE $3 DEPARTMENT $10 ;
INPUT FILE DATE:YYMMDD10. DEPARTMENT $;
FORMAT DATE YYMMDD10.;
DATALINES;

102 2024-01-15 RECEPTION
102 2024-01-30 RECEPTION
102 2024-02-10 SHOES
102 2024-02-29 RECEPTION 
102 2024-02-29 SHOES
103 2024-01-22 RECEPTION
103 2024-02-15 RECEPTION
103 2024-02-20 RECEPTION
103 2024-03-02 PANTS
104 2024-04-12 RECEPTION
104 2024-05-02 RECEPTION
104 2024-05-20 RECEPTION
104 2024-05-30 SHIRTS
104 2024-06-14 SHIRTS
104 2024-06-30 SHIRTS 
;
RUN;

I would like to find, for each file, the first department where the item was transferred after Reception. For that example, the result would be as follows:

Does anyone know how to proceed ?

Thanks

Astounding · Posted 10-17-2024 07:04 PM

Given that your observations are in date order for each FILE, this should be easy:

data want;
   set SALE;
   by file;
   where department ne "RECEPTION";
   if first.file;
run;

The WHERE statement performs subsetting before first.file gets created.

View solution in original post

data_null__ · Posted 10-17-2024 12:58 PM

This might work but I doubt you example data covers all scenarios.

DATA SALE;
LENGTH FILE $3 DEPARTMENT $10 ;
INPUT FILE DATE:YYMMDD10. DEPARTMENT $;
FORMAT DATE YYMMDD10.;
DATALINES;
102 2024-01-15 RECEPTION
102 2024-01-30 RECEPTION
102 2024-02-10 SHOES
102 2024-02-29 RECEPTION 
102 2024-02-29 SHOES
103 2024-01-22 RECEPTION
103 2024-02-15 RECEPTION
103 2024-02-20 RECEPTION
103 2024-03-02 PANTS
104 2024-04-12 RECEPTION
104 2024-05-02 RECEPTION
104 2024-05-20 RECEPTION
104 2024-05-30 SHIRTS
104 2024-06-14 SHIRTS
104 2024-06-30 SHIRTS 
;
RUN;

data sale;
   set sale;
   by file;
   ddif = dif(date);
   if first.file then ddif=.;
   run;
proc print;
   run;
proc summary data=sale nway;
   by file;
   where department ne: 'REC' and ddif ne 0;
   class department;
   output out=test(drop=_:) idgroup(min(ddif) out[1](date ddif)=);
   run;
proc print;
   run;

Kurt_Bremser · Posted 10-17-2024 05:06 PM

data want;
set sales,,
by file;
length next_department $10 date_transfer 8;
format date_transfer yymmdd10.;
retain next_department date_transfer;
if first.file
then do;
  next_department = "";
  date_transfer = .;
end;
if department ne "RECEPTION" and next_department = ""
then do;
  next_department = department;
  date_transfer = date;
end;
if last.file;
keep file next_department date_transfer;
run;

Untested, posted from my tablet.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Tom · Posted 10-17-2024 05:54 PM

Do you mean you want the first observation (that is not "RECEPTION') for each FILE? That does not sound like it needs to do anything with NEXT observation. Which makes it easier since it is easier to remember something than predict the future.

data want;
  set sale;
  by file date ;
  if first.file then any=0;
  retain any;
  if department ne 'RECEPTION' then do;
    if not any then output;
    any=1;
  end;
  drop any;
run;

Result

Obs    FILE    DEPARTMENT          DATE

 1     102       SHOES       2024-02-10
 2     103       PANTS       2024-03-02
 3     104       SHIRTS      2024-05-30

Astounding · Posted 10-17-2024 07:04 PM

Given that your observations are in date order for each FILE, this should be easy:

data want;
   set SALE;
   by file;
   where department ne "RECEPTION";
   if first.file;
run;

The WHERE statement performs subsetting before first.file gets created.

Kurt_Bremser · Posted 10-18-2024 04:20 AM

Elegant.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Ksharp · Posted 10-18-2024 04:55 AM

Kurt,
Nope. If there were not any "RECEPTION" within a FILE group, that could get wrong result.

Astounding · Posted 10-18-2024 09:39 AM

@Ksharp ,

Of course you're right about that. And other cases as well: if the first RECEPTION record comes after a non-RECEPTION record.

My real point: since the poster doesn't say what's really in the data, and it makes such a difference in the complexity of the solution, shouldn't somebody have asked? The original poster probably doesn't see that the question is important, so let me ask:

For a FILE, does the first record have to be RECEPTION or could it be something else?

If it is something else, should that record be ignored?

sasuser_8 · Posted 10-18-2024 12:35 PM

Yes, the first record have to be RECEPTION. I didn't specify it and it's true that it's important. So your solution will work 🙂 Thanks !

Ksharp · Posted 10-19-2024 02:31 AM

Astounding,
Don't be offended. I truely understand what you mean.
We all provide some suggestion based on the data posted by OP . We don't know how it looks like or how it was in reality.
So base on the data posted by OP and write code to get what OP want, That is OK.
My real point: make my own code to be as strong/robust as it could be to take into acount of many accident scenarios .

All the best.

mkeintz · Posted 10-17-2024 09:22 PM

I think you want one record per FILE, namely the first non-reception record to follow a reception:

DATA SALE;
LENGTH FILE $3 DEPARTMENT $10 ;
INPUT FILE DATE:YYMMDD10. DEPARTMENT $;
FORMAT DATE YYMMDD10.;
DATALINES;
102 2024-01-15 RECEPTION
102 2024-01-30 RECEPTION
102 2024-02-10 SHOES
102 2024-02-29 RECEPTION 
102 2024-02-29 SHOES
103 2024-01-22 RECEPTION
103 2024-02-15 RECEPTION
103 2024-02-20 RECEPTION
103 2024-03-02 PANTS
104 2024-04-12 RECEPTION
104 2024-05-02 RECEPTION
104 2024-05-20 RECEPTION
104 2024-05-30 SHIRTS
104 2024-06-14 SHIRTS
104 2024-06-30 SHIRTS 
RUN;

data want;
  set sale;
  by file; 
  if first.file^=1 and department^='RECEPTION' and lag(department)='RECEPTION';
  if file^=lag(file);
run;

The first subsetting IF gets all non-receptions that follow a reception.

From that subset, the second subsetting IF takes only the first such record for each FILE.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

Ksharp · Posted 10-17-2024 10:41 PM

DATA SALE;
LENGTH FILE $3 DEPARTMENT $10 ;
INPUT FILE DATE:YYMMDD10. DEPARTMENT $;
FORMAT DATE YYMMDD10.;
DATALINES;
102 2024-01-15 RECEPTION
102 2024-01-30 RECEPTION
102 2024-02-10 SHOES
102 2024-02-29 RECEPTION 
102 2024-02-29 SHOES
103 2024-01-22 RECEPTION
103 2024-02-15 RECEPTION
103 2024-02-20 RECEPTION
103 2024-03-02 PANTS
104 2024-04-12 RECEPTION
104 2024-05-02 RECEPTION
104 2024-05-20 RECEPTION
104 2024-05-30 SHIRTS
104 2024-06-14 SHIRTS
104 2024-06-30 SHIRTS 
;
data want;
 set sale;
 by FILE;
 retain found 0;
 if first.FILE then found=0;
 if lag(DEPARTMENT) eq 'RECEPTION' and DEPARTMENT ne 'RECEPTION' and lag(FILE)=FILE and not found then do;
  output; found=1;
 end;
 drop found;
run;

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away