BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
sasgorilla
Quartz | Level 8

I have a large dataset with several variables. Where record=C, this is the current accepted Label value. Where record=H these are historical labels.

 

I am having trouble figuring out how to produce a new dataset with the label for every ID that is current (i.e., where record=C), but also if there was a change in label during the time period. Obviously getting where records=C is easy, but that 'if there record value changed during the set time period' part is tripping me up. 

 

In other words, if there were historical changes but the label didn't change, then only the current record is desired. If there was a label change I would want both the current and the historical record to see what the label was changed from . 

 

Time period of interest 2018-01-01 through 2023-12-31  (*note, present record dtend=9999-12-31)

 

data have;
input record $ label $ dtstart :yymmdd10. dtend :yymmdd10. id;
format dtstart yymmdd10. dtend yymmdd10.;
datalines;
H WA123 2010-01-01 2018-07-05 1
C WA123 2018-07-06 9999-12-31 1
H AJ456 2015-02-15 2019-10-30 2
H AJ456 2019-10-31 2022-12-31 2
C AJ456 2023-01-01 9999-12-31 2
H AJ123 2000-01-25 2010-08-20 3
H AJ123 2010-08-21 2020-09-01 3
C AJ124 2020-09-02 9999-12-31 3
;
RUN;

data want; input record $ label $ dtstart :yymmdd10. dtend :yymmdd10. id changeflag; format dtstart yymmdd10. dtend yymmdd10.; datalines; C WA123 2018-07-06 9999-12-31 1 0 C AJ456 2023-01-01 9999-12-31 2 0 H AJ123 2010-08-21 2020-09-01 3 1 C AJ124 2020-09-02 9999-12-31 3 1 ; RUN;

Does my want make sense? 

For IDs 1 and 2 : I don't need previous records because label didn't change.

For ID 3 I need the current and the previous record (which I would want flagged as being part of a changed set) because they changed during the time period.  

The last caveat, if there were multiple label changes in the time period for an ID I would need to see each of them, not just the most recent. 

1 ACCEPTED SOLUTION

Accepted Solutions
sasgorilla
Quartz | Level 8

Thank you, everyone for your help. 

 

I discovered a bit of a problem as I applied several of the variations of code provided that I wasn't aware of initially. That is, the 'ID' for some labels changed over records as well. What I ended up doing was using another variable that was essentially a group name, and then going through the labels and IDs for that group name. I essentially kept all non-duplicated lables rather than IDs, as the labels were ultimately most important. 

 

Thanks again. 

View solution in original post

8 REPLIES 8
Ksharp
Super User

OK. Assuming the data has been sorted as what you showed.

Otherwise, you need one more PROC SORT.

data have;
input record $ label $ dtstart :yymmdd10. dtend :yymmdd10. id;
format dtstart yymmdd10. dtend yymmdd10.;
datalines;
H WA123 2010-01-01 2018-07-05 1
C WA123 2018-07-06 9999-12-31 1
H AJ456 2015-02-15 2019-10-30 2
H AJ456 2019-10-31 2022-12-31 2
C AJ456 2023-01-01 9999-12-31 2
H AJ123 2000-01-25 2010-08-20 3
H AJ123 2010-08-21 2020-09-01 3
C AJ124 2020-09-02 9999-12-31 3
;
RUN;

data temp;
 set have;
 by id label notsorted;
 if last.label;
run;
data want;
 set temp;
 by id;
 changeflag=not (first.id and last.id);
run;

 

sasgorilla
Quartz | Level 8

Thanks, Ksharp.

It looks like your code gets me closer, but it isn't accounting for the date range specified. Is there a way I can limit the observations being pulled to only those that occurred within a time window (i.e. 2018-01-01 to 2023-12-31)? 

Ksharp
Super User

I don't understand what you mean.

You could filter these obs by IF or WHERE statement. Like:

 

data have;
input record $ label $ dtstart :yymmdd10. dtend :yymmdd10. id;
format dtstart yymmdd10. dtend yymmdd10.;
if '01jan2018'd<=dtstart<='31dec2023'd ;
datalines;
H WA123 2010-01-01 2018-07-05 1
C WA123 2018-07-06 9999-12-31 1
H AJ456 2015-02-15 2019-10-30 2
H AJ456 2019-10-31 2022-12-31 2
C AJ456 2023-01-01 9999-12-31 2
H AJ123 2000-01-25 2010-08-20 3
H AJ123 2010-08-21 2020-09-01 3
C AJ124 2020-09-02 9999-12-31 3
;
RUN;

 

 

Ksharp
Super User

Or you want this statement ?

data have;
input record $ label $ dtstart :yymmdd10. dtend :yymmdd10. id;
format dtstart yymmdd10. dtend yymmdd10.;
if '01jan2018'd<=dtstart or dtend<='31dec2023'd ;
Patrick
Opal | Level 21

Very similar to what @Ksharp already proposed. If that doesn't return the desired result then please post amended sample data with the additional cases.

data have;
  input record $ label $ dtstart :yymmdd10. dtend :yymmdd10. id changeflag;
  format dtstart yymmdd10. dtend yymmdd10.;
  datalines;
H WA123 2010-01-01 2018-07-05 1 .
C WA123 2018-07-06 9999-12-31 1 0
H AJ456 2015-02-15 2019-10-30 2 .
H AJ456 2019-10-31 2022-12-31 2 .
C AJ456 2023-01-01 9999-12-31 2 0
H AJ123 2000-01-25 2010-08-20 3 .
H AJ123 2010-08-21 2020-09-01 3 1
C AJ124 2020-09-02 9999-12-31 3 1
;
RUN;

/*proc sort data=have presorted;*/
/*  by id dtstart;*/
/*run;*/

data want;
  set have(where=(dtstart   between '01jan2018'd and '31dec2023'd 
                  or dtend between '01jan2018'd and '31dec2023'd 
                  or dtstart<'01jan2018'd and dtend='31dec9999'd));
  by id label notsorted;
  retain changeflag_derived 0;
  if last.label then 
    do;
      if not last.id then changeflag_derived= 1;
      output;
      if last.id then changeflag_derived= 0;
    end;
run;

proc print data=want;
run;

Patrick_0-1732169606110.png

 

 

Kurt_Bremser
Super User

This also creates your result:

data have;
input record $ label $ dtstart :yymmdd10. dtend :yymmdd10. id;
format dtstart yymmdd10. dtend yymmdd10.;
datalines;
H WA123 2010-01-01 2018-07-05 1
C WA123 2018-07-06 9999-12-31 1
H AJ456 2015-02-15 2019-10-30 2
H AJ456 2019-10-31 2022-12-31 2
C AJ456 2023-01-01 9999-12-31 2
H AJ123 2000-01-25 2010-08-20 3
H AJ123 2010-08-21 2020-09-01 3
C AJ124 2020-09-02 9999-12-31 3
;

%let start = %sysfunc(inputn(2018-01-01,yymmdd10.));
%let end = %sysfunc(inputn(2023-12-31,yymmdd10.));

data want;
merge
  have
  have (
    firstobs=2
    keep=id label
    rename=(id=_id label=_label)
  )
;
if
  _id ne id or 
  id = _id and label ne _label and &start. le dtend le &end.
;
drop _id _label;
run;
ballardw
Super User

This example doesn't do anything to incorporate your data range because I don't see a clear definition of how date range compliance is determined given TWO date variables.

 

One thought: Change, just a change, can be done with numeric values using the Range function or summary statistic. If the Range is 0 for set then there was no change. So one approach would be to add a numeric variable and check the range.

 

data have;
input record $ label $ dtstart :yymmdd10. dtend :yymmdd10. id;
format dtstart yymmdd10. dtend yymmdd10.;
  rec_num = index(record,'C');
datalines;
H WA123 2010-01-01 2018-07-05 1
C WA123 2018-07-06 9999-12-31 1
H AJ456 2015-02-15 2019-10-30 2
H AJ456 2019-10-31 2022-12-31 2
C AJ456 2023-01-01 9999-12-31 2
H AJ123 2000-01-25 2010-08-20 3
H AJ123 2010-08-21 2020-09-01 3
C AJ124 2020-09-02 9999-12-31 3
;
RUN;

proc summary data=have nway;
   class label;
   var rec_num;
   output out=example (drop=_:) range=ChangeFlag;
run;

Where this approach might be of more use is when you have multiple values of the "record" type variable and are interested in a change between group A, B, C and P, D, Q but not within the group.  All the A, B, C could be assigned one number and all of P,D, Q a different number.

 

A potentially useful side effect of this approach given some hierarchy of values or groups of values is the actual range value gives "how far" the  values changed though not the direction.

sasgorilla
Quartz | Level 8

Thank you, everyone for your help. 

 

I discovered a bit of a problem as I applied several of the variations of code provided that I wasn't aware of initially. That is, the 'ID' for some labels changed over records as well. What I ended up doing was using another variable that was essentially a group name, and then going through the labels and IDs for that group name. I essentially kept all non-duplicated lables rather than IDs, as the labels were ultimately most important. 

 

Thanks again. 

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 8 replies
  • 767 views
  • 6 likes
  • 5 in conversation