Solved: How to create identifications for successive values at irregular time ...

Donnie · Posted 05-19-2021 11:13 AM

Dear SAS Programmers, I need help to write a code that creates indentification for nonsuccessive values. Here is a sample of my dataset:
  Date            time       ex_value   op  TYPE matched 
01/01/2020     8:30          20          C     Q         1
01/01/2020     8:32          20          C     Q         1
01/01/2020     8:34          65          C      T         8
01/01/2020      8:36         65          C      T        12
01/01/2020      8:39         72          C      T        10
01/01/2020      8:59         72          P      Q        1
01/01/2020      9:09         72          P      Q        1
01/01/2020     10:09        72          P      Q       1
01/01/2020     10:19        72          P      Q       1
01/01/2020     10:29        72         P      T       58
01/01/2020     10:29        72          P     T       35
 
* My identifiers are date, ex_value and op. 
* Using the TYPE variable as an identifier will mess up the time chronology (or the time variable)
* Now I want to create an identifier for the matched variable (Let's call this new identifier variable Matched_ID) with successive  values of 1 (also identified by TYPE = "Q") so that the data will look like:
  Date           time    ex_value   op  TYPE matched  Matched_ID
01/01/2020     8:30        20        C        Q         1                 1
01/01/2020      8:32        20       C        Q         1                 1
01/01/2020      8:34        65       C         T         8                 .
01/01/2020      8:36        65       C         T        12                .
01/01/2020      8:39        72       C         T        10                .
01/01/2020      8:59        72       P         Q        1                 2
01/01/2020      9:09        72       P         Q        1                 2
01/01/2020     10:09       72       P         Q        1                 2
01/01/2020     10:19       72       P         Q        1                 2
01/01/2020     10:29       72       P         T       58                 .
01/01/2020     10:29       72       P         T       35                 .
 
Thanks all.

ballardw · Posted 05-19-2021 12:31 PM

Perhaps

data want;
   set have;
   by notsorted type;
   retain matchcount;
   if first.type and type='Q' then matchcount+1;
   if type='Q' then matched_id=matchcount;
   else matched_id = .;
   drop matchcount;
run;

View solution in original post

Kurt_Bremser · Posted 05-19-2021 11:25 AM

See this:

data have;
input Date :ddmmyy10. time :time5. ex_value op $ TYPE $ matched;
format date ddmmyy10. time time5.;
datalines; 
01/01/2020     8:30          20          C     Q         1
01/01/2020     8:32          20          C     Q         1
01/01/2020     8:34          65          C      T         8
01/01/2020      8:36         65          C      T        12
01/01/2020      8:39         72          C      T        10
01/01/2020      8:59         72          P      Q        1
01/01/2020      9:09         72          P      Q        1
01/01/2020     10:09        72          P      Q       1
01/01/2020     10:19        72          P      Q       1
01/01/2020     10:29        72         P      T       58
01/01/2020     10:29        72          P     T       35
;

data want;
set have;
retain _match;
if matched = 1
then do;
  if lag(matched) ne 1 then _match + 1;
  matched_id = _match;
end;
else matched_id = .;
drop _match;
run;

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Donnie · Posted 05-19-2021 11:46 AM

Thank you @Kurt_Bremser . Your solution is helpful, but it does not create unique identifiers for matched_id based on successive values of the matched variable (based on the BY Variables date, ex_value, and op) as I showed in my second dataset sample in the original post.

Kurt_Bremser · Posted 05-19-2021 11:50 AM

My code creates your wanted result from the data as stated in this thread. I tested it.

Be diligent in the example data you present. Have it include all the combinations you have in your real data that impact

Post new example data, and show where my code creates a result that differs from what you expect.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Donnie · Posted 05-19-2021 11:56 AM

Thank you once again @Kurt_Bremser . I also tested your sample (which is based on the sample that I posted). The matched_id variable only contains 1's. Please check again.

Kurt_Bremser · Posted 05-19-2021 12:04 PM

@Donnie wrote:

Thank you once again @Kurt_Bremser . I also tested your sample (which is based on the sample that I posted). The matched_id variable only contains 1's. Please check again.

Then you have not run my code in its entirety. Proof:

data have;
input Date :ddmmyy10. time :time5. ex_value op $ TYPE $ matched;
format date ddmmyy10. time time5.;
datalines; 
01/01/2020     8:30          20          C     Q         1
01/01/2020     8:32          20          C     Q         1
01/01/2020     8:34          65          C      T         8
01/01/2020      8:36         65          C      T        12
01/01/2020      8:39         72          C      T        10
01/01/2020      8:59         72          P      Q        1
01/01/2020      9:09         72          P      Q        1
01/01/2020     10:09        72          P      Q       1
01/01/2020     10:19        72          P      Q       1
01/01/2020     10:29        72         P      T       58
01/01/2020     10:29        72          P     T       35
;

data want;
set have;
retain _match;
if matched = 1
then do;
  if lag(matched) ne 1 then _match + 1;
  matched_id = _match;
end;
else matched_id = .;
drop _match;
run;

proc print data=want noobs;
run;

Result:

Date	time	ex_value	op	TYPE	matched	matched_id
01/01/2020	8:30	20	C	Q	1	1
01/01/2020	8:32	20	C	Q	1	1
01/01/2020	8:34	65	C	T	8	.
01/01/2020	8:36	65	C	T	12	.
01/01/2020	8:39	72	C	T	10	.
01/01/2020	8:59	72	P	Q	1	1
01/01/2020	9:09	72	P	Q	1	1
01/01/2020	10:09	72	P	Q	1	1
01/01/2020	10:19	72	P	Q	1	1
01/01/2020	10:29	72	P	T	58	.
01/01/2020	10:29	72	P	T	35	.

As you can see, there ARE the missing values you wanted.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Donnie · Posted 05-19-2021 12:17 PM

May be I mixing up some terminologies. I am sorry if this is the case. But I want the last variable (matched_ID) in your sample to look like 

matched_ID
1
1
1
.
.
.
2
2
2
2
.
.
.
3
3
3 
.and so on.

 I sorted the dataset on DATE, EX_VALUE, and OPvariables. Also  notice that the variable MATCHED is always equal to 1 whenever the variable TYPE = Q. 

Thanks.

ballardw · Posted 05-19-2021 12:31 PM

Perhaps

data want;
   set have;
   by notsorted type;
   retain matchcount;
   if first.type and type='Q' then matchcount+1;
   if type='Q' then matched_id=matchcount;
   else matched_id = .;
   drop matchcount;
run;

Donnie · Posted 05-19-2021 12:40 PM

Thanks @ballardw. This code works. I appreciate your time.

Kurt_Bremser · Posted 05-19-2021 02:20 PM

<insert 4-letter word here>.

That's what you get when you start optimizing without thinking.

This is my original code that worked:

data want;
set have;
retain _match;
if matched = 1 and lag(matched) ne 1 then _match + 1;
if matched = 1
then matched_id = _match;
else matched_id = .;
drop _match;
run;

My "optimized" code committed the fatal mistake of using LAG in a condition, which is A BAD THING. And I did not pay attention to the slightly different outcome. My bad.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Donnie · Posted 05-19-2021 08:29 PM

Thanks for your time.

ballardw · Posted 05-19-2021 12:01 PM

@Donnie wrote:

Thank you @Kurt_Bremser . Your solution is helpful, but it does not create unique identifiers for matched_id based on successive values of the matched variable (based on the BY Variables date, ex_value, and op) as I showed in my second dataset sample in the original post.

Personally I'm still trying to determine where "unique identifier" comes in at all. You explicitly assign the same value of the identifier to two different records and most of the records have no value at all. So neither "unique" nor much of an identifier.

Your first post, and the comments in the code say:

* Using the TYPE variable as an identifier will mess up the time chronology (or the time variable)

without describing at all how time chronology is used. Your description then goes on to state that Type is sort of the determining value for setting your new variable. So I'm confused.

I would suggest the terminology "sequential grouping" identifier or similar instead of unique. "Unique" typically refers to one.

Donnie · Posted 05-19-2021 12:27 PM

Ok. I think I mixed up some terminologies. Do you have any suggestion on how I can create the dataset in my second sample in the original post?

Thanks.

How to create identifications for successive values at irregular time intervals (Repost)

Re: How to create identifications for successive values at irregular time intervals (Repost)

Re: How to create identifications for successive values at irregular time intervals (Repost)

Re: How to create identifications for successive values at irregular time intervals (Repost)

Re: How to create identifications for successive values at irregular time intervals (Repost)

Re: How to create identifications for successive values at irregular time intervals (Repost)

Re: How to create identifications for successive values at irregular time intervals (Repost)

Re: How to create identifications for successive values at irregular time intervals (Repost)

Re: How to create identifications for successive values at irregular time intervals (Repost)

Re: How to create identifications for successive values at irregular time intervals (Repost)

Re: How to create identifications for successive values at irregular time intervals (Repost)

Re: How to create identifications for successive values at irregular time intervals (Repost)

Re: How to create identifications for successive values at irregular time intervals (Repost)

Re: How to create identifications for successive values at irregular time intervals (Repost)

Registration is open

SAS Training: Just a Click Away