BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Donnie
Calcite | Level 5
Dear SAS Programmers, I need help to write a code that creates indentification for nonsuccessive values. Here is a sample of my dataset:
  Date            time       ex_value   op  TYPE matched 
01/01/2020     8:30          20          C     Q         1
01/01/2020     8:32          20          C     Q         1
01/01/2020     8:34          65          C      T         8
01/01/2020      8:36         65          C      T        12
01/01/2020      8:39         72          C      T        10
01/01/2020      8:59         72          P      Q        1
01/01/2020      9:09         72          P      Q        1
01/01/2020     10:09        72          P      Q       1
01/01/2020     10:19        72          P      Q       1
01/01/2020     10:29        72         P      T       58
01/01/2020     10:29        72          P     T       35
 
* My identifiers are date, ex_value and op. 
* Using the TYPE variable as an identifier will mess up the time chronology (or the time variable)
* Now I want to create an identifier for the matched variable (Let's call this new identifier variable Matched_ID) with successive  values of 1 (also identified by TYPE = "Q") so that the data will look like:
  Date           time    ex_value   op  TYPE matched  Matched_ID
01/01/2020     8:30        20        C        Q         1                 1
01/01/2020      8:32        20       C        Q         1                 1
01/01/2020      8:34        65       C         T         8                 .
01/01/2020      8:36        65       C         T        12                .
01/01/2020      8:39        72       C         T        10                .
01/01/2020      8:59        72       P         Q        1                 2
01/01/2020      9:09        72       P         Q        1                 2
01/01/2020     10:09       72       P         Q        1                 2
01/01/2020     10:19       72       P         Q        1                 2
01/01/2020     10:29       72       P         T       58                 .
01/01/2020     10:29       72       P         T       35                 .
 
Thanks all. 

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

Perhaps

data want;
   set have;
   by notsorted type;
   retain matchcount;
   if first.type and type='Q' then matchcount+1;
   if type='Q' then matched_id=matchcount;
   else matched_id = .;
   drop matchcount;
run;

View solution in original post

12 REPLIES 12
Kurt_Bremser
Super User

See this:

data have;
input Date :ddmmyy10. time :time5. ex_value op $ TYPE $ matched;
format date ddmmyy10. time time5.;
datalines; 
01/01/2020     8:30          20          C     Q         1
01/01/2020     8:32          20          C     Q         1
01/01/2020     8:34          65          C      T         8
01/01/2020      8:36         65          C      T        12
01/01/2020      8:39         72          C      T        10
01/01/2020      8:59         72          P      Q        1
01/01/2020      9:09         72          P      Q        1
01/01/2020     10:09        72          P      Q       1
01/01/2020     10:19        72          P      Q       1
01/01/2020     10:29        72         P      T       58
01/01/2020     10:29        72          P     T       35
;

data want;
set have;
retain _match;
if matched = 1
then do;
  if lag(matched) ne 1 then _match + 1;
  matched_id = _match;
end;
else matched_id = .;
drop _match;
run;
Donnie
Calcite | Level 5

Thank you @Kurt_Bremser . Your solution is helpful, but it does not create unique identifiers for matched_id based on successive values of the matched  variable (based on the BY Variables date, ex_value,  and op) as I showed in my second dataset sample in the original post. 

Kurt_Bremser
Super User

My code creates your wanted result from the data as stated in this thread. I tested it.

Be diligent in the example data you present. Have it include all the combinations you have in your real data that impact 

Post new example data, and show where my code creates a result that differs from what you expect.

Donnie
Calcite | Level 5

Thank you once again @Kurt_Bremser . I also tested your sample (which is based on the sample that I posted). The matched_id variable only contains 1's. Please check again. 

Kurt_Bremser
Super User

@Donnie wrote:

Thank you once again @Kurt_Bremser . I also tested your sample (which is based on the sample that I posted). The matched_id variable only contains 1's. Please check again. 


Then you have not run my code in its entirety. Proof:

data have;
input Date :ddmmyy10. time :time5. ex_value op $ TYPE $ matched;
format date ddmmyy10. time time5.;
datalines; 
01/01/2020     8:30          20          C     Q         1
01/01/2020     8:32          20          C     Q         1
01/01/2020     8:34          65          C      T         8
01/01/2020      8:36         65          C      T        12
01/01/2020      8:39         72          C      T        10
01/01/2020      8:59         72          P      Q        1
01/01/2020      9:09         72          P      Q        1
01/01/2020     10:09        72          P      Q       1
01/01/2020     10:19        72          P      Q       1
01/01/2020     10:29        72         P      T       58
01/01/2020     10:29        72          P     T       35
;

data want;
set have;
retain _match;
if matched = 1
then do;
  if lag(matched) ne 1 then _match + 1;
  matched_id = _match;
end;
else matched_id = .;
drop _match;
run;

proc print data=want noobs;
run;

Result:

Date	time	ex_value	op	TYPE	matched	matched_id
01/01/2020	8:30	20	C	Q	1	1
01/01/2020	8:32	20	C	Q	1	1
01/01/2020	8:34	65	C	T	8	.
01/01/2020	8:36	65	C	T	12	.
01/01/2020	8:39	72	C	T	10	.
01/01/2020	8:59	72	P	Q	1	1
01/01/2020	9:09	72	P	Q	1	1
01/01/2020	10:09	72	P	Q	1	1
01/01/2020	10:19	72	P	Q	1	1
01/01/2020	10:29	72	P	T	58	.
01/01/2020	10:29	72	P	T	35	.

As you can see, there ARE the missing values you wanted.

Donnie
Calcite | Level 5
May be I mixing up some terminologies. I am sorry if this is the case. But I want the last variable (matched_ID) in your sample to look like 

matched_ID
1
1
1
.
.
.
2
2
2
2
.
.
.
3
3
3 
.and so on.

 I sorted the dataset on DATE, EX_VALUE, and OPvariables. Also  notice that the variable MATCHED is always equal to 1 whenever the variable TYPE = Q. 

Thanks. 
ballardw
Super User

Perhaps

data want;
   set have;
   by notsorted type;
   retain matchcount;
   if first.type and type='Q' then matchcount+1;
   if type='Q' then matched_id=matchcount;
   else matched_id = .;
   drop matchcount;
run;
Donnie
Calcite | Level 5

Thanks @ballardw. This code works. I appreciate your time. 

Kurt_Bremser
Super User

<insert 4-letter word here>.

That's what you get when you start optimizing without thinking.

This is my original code that worked:

data want;
set have;
retain _match;
if matched = 1 and lag(matched) ne 1 then _match + 1;
if matched = 1
then matched_id = _match;
else matched_id = .;
drop _match;
run;

My "optimized" code committed the fatal mistake of using LAG in a condition, which is A BAD THING. And I did not pay attention to the slightly different outcome. My bad.

ballardw
Super User

@Donnie wrote:

Thank you @Kurt_Bremser . Your solution is helpful, but it does not create unique identifiers for matched_id based on successive values of the matched  variable (based on the BY Variables date, ex_value,  and op) as I showed in my second dataset sample in the original post. 


Personally I'm still trying to determine where "unique identifier" comes in at all. You explicitly assign the same value of the identifier to two different records and most of the records have no value at all. So neither "unique" nor much of an identifier.

 

Your first post, and the comments in the code say:

* Using the TYPE variable as an identifier will mess up the time chronology (or the time variable)

without describing at all how time chronology is used. Your description then goes on to state that Type is sort of the determining value for setting your new variable. So I'm confused.

 

I would suggest the terminology "sequential grouping" identifier or similar instead of unique. "Unique" typically refers to one.

Donnie
Calcite | Level 5

Ok. I think I mixed up some terminologies. Do you have any suggestion on how I can create the dataset in my second sample in the original post? 

 

Thanks. 

hackathon24-white-horiz.png

The 2025 SAS Hackathon Kicks Off on June 11!

Watch the live Hackathon Kickoff to get all the essential information about the SAS Hackathon—including how to join, how to participate, and expert tips for success.

YouTube LinkedIn

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 12 replies
  • 2052 views
  • 0 likes
  • 3 in conversation