Dear SAS Programmers, I need help to write a code that creates indentification for nonsuccessive values. Here is a sample of my dataset: Date time ex_value op TYPE matched 01/01/2020 8:30 20 C Q 1 01/01/2020 8:32 20 C Q 1 01/01/2020 8:34 65 C T 8 01/01/2020 8:36 65 C T 12 01/01/2020 8:39 72 C T 10 01/01/2020 8:59 72 P Q 1 01/01/2020 9:09 72 P Q 1 01/01/2020 10:09 72 P Q 1 01/01/2020 10:19 72 P Q 1 01/01/2020 10:29 72 P T 58 01/01/2020 10:29 72 P T 35 * My identifiers are date, ex_value and op. * Using the TYPE variable as an identifier will mess up the time chronology (or the time variable) * Now I want to create an identifier for the matched variable (Let's call this new identifier variable Matched_ID) with successive values of 1 (also identified by TYPE = "Q") so that the data will look like: Date time ex_value op TYPE matched Matched_ID 01/01/2020 8:30 20 C Q 1 1 01/01/2020 8:32 20 C Q 1 1 01/01/2020 8:34 65 C T 8 . 01/01/2020 8:36 65 C T 12 . 01/01/2020 8:39 72 C T 10 . 01/01/2020 8:59 72 P Q 1 2 01/01/2020 9:09 72 P Q 1 2 01/01/2020 10:09 72 P Q 1 2 01/01/2020 10:19 72 P Q 1 2 01/01/2020 10:29 72 P T 58 . 01/01/2020 10:29 72 P T 35 . Thanks all.
Perhaps
data want; set have; by notsorted type; retain matchcount; if first.type and type='Q' then matchcount+1; if type='Q' then matched_id=matchcount; else matched_id = .; drop matchcount; run;
See this:
data have;
input Date :ddmmyy10. time :time5. ex_value op $ TYPE $ matched;
format date ddmmyy10. time time5.;
datalines;
01/01/2020 8:30 20 C Q 1
01/01/2020 8:32 20 C Q 1
01/01/2020 8:34 65 C T 8
01/01/2020 8:36 65 C T 12
01/01/2020 8:39 72 C T 10
01/01/2020 8:59 72 P Q 1
01/01/2020 9:09 72 P Q 1
01/01/2020 10:09 72 P Q 1
01/01/2020 10:19 72 P Q 1
01/01/2020 10:29 72 P T 58
01/01/2020 10:29 72 P T 35
;
data want;
set have;
retain _match;
if matched = 1
then do;
if lag(matched) ne 1 then _match + 1;
matched_id = _match;
end;
else matched_id = .;
drop _match;
run;
Thank you @Kurt_Bremser . Your solution is helpful, but it does not create unique identifiers for matched_id based on successive values of the matched variable (based on the BY Variables date, ex_value, and op) as I showed in my second dataset sample in the original post.
My code creates your wanted result from the data as stated in this thread. I tested it.
Be diligent in the example data you present. Have it include all the combinations you have in your real data that impact
Post new example data, and show where my code creates a result that differs from what you expect.
Thank you once again @Kurt_Bremser . I also tested your sample (which is based on the sample that I posted). The matched_id variable only contains 1's. Please check again.
@Donnie wrote:
Thank you once again @Kurt_Bremser . I also tested your sample (which is based on the sample that I posted). The matched_id variable only contains 1's. Please check again.
Then you have not run my code in its entirety. Proof:
data have;
input Date :ddmmyy10. time :time5. ex_value op $ TYPE $ matched;
format date ddmmyy10. time time5.;
datalines;
01/01/2020 8:30 20 C Q 1
01/01/2020 8:32 20 C Q 1
01/01/2020 8:34 65 C T 8
01/01/2020 8:36 65 C T 12
01/01/2020 8:39 72 C T 10
01/01/2020 8:59 72 P Q 1
01/01/2020 9:09 72 P Q 1
01/01/2020 10:09 72 P Q 1
01/01/2020 10:19 72 P Q 1
01/01/2020 10:29 72 P T 58
01/01/2020 10:29 72 P T 35
;
data want;
set have;
retain _match;
if matched = 1
then do;
if lag(matched) ne 1 then _match + 1;
matched_id = _match;
end;
else matched_id = .;
drop _match;
run;
proc print data=want noobs;
run;
Result:
Date time ex_value op TYPE matched matched_id 01/01/2020 8:30 20 C Q 1 1 01/01/2020 8:32 20 C Q 1 1 01/01/2020 8:34 65 C T 8 . 01/01/2020 8:36 65 C T 12 . 01/01/2020 8:39 72 C T 10 . 01/01/2020 8:59 72 P Q 1 1 01/01/2020 9:09 72 P Q 1 1 01/01/2020 10:09 72 P Q 1 1 01/01/2020 10:19 72 P Q 1 1 01/01/2020 10:29 72 P T 58 . 01/01/2020 10:29 72 P T 35 .
As you can see, there ARE the missing values you wanted.
May be I mixing up some terminologies. I am sorry if this is the case. But I want the last variable (matched_ID) in your sample to look like matched_ID 1 1 1 . . . 2 2 2 2 . . . 3 3 3 .and so on. I sorted the dataset on DATE, EX_VALUE, and OPvariables. Also notice that the variable MATCHED is always equal to 1 whenever the variable TYPE = Q. Thanks.
Perhaps
data want; set have; by notsorted type; retain matchcount; if first.type and type='Q' then matchcount+1; if type='Q' then matched_id=matchcount; else matched_id = .; drop matchcount; run;
Thanks @ballardw. This code works. I appreciate your time.
<insert 4-letter word here>.
That's what you get when you start optimizing without thinking.
This is my original code that worked:
data want;
set have;
retain _match;
if matched = 1 and lag(matched) ne 1 then _match + 1;
if matched = 1
then matched_id = _match;
else matched_id = .;
drop _match;
run;
My "optimized" code committed the fatal mistake of using LAG in a condition, which is A BAD THING. And I did not pay attention to the slightly different outcome. My bad.
Thanks for your time.
@Donnie wrote:
Thank you @Kurt_Bremser . Your solution is helpful, but it does not create unique identifiers for matched_id based on successive values of the matched variable (based on the BY Variables date, ex_value, and op) as I showed in my second dataset sample in the original post.
Personally I'm still trying to determine where "unique identifier" comes in at all. You explicitly assign the same value of the identifier to two different records and most of the records have no value at all. So neither "unique" nor much of an identifier.
Your first post, and the comments in the code say:
* Using the TYPE variable as an identifier will mess up the time chronology (or the time variable)
without describing at all how time chronology is used. Your description then goes on to state that Type is sort of the determining value for setting your new variable. So I'm confused.
I would suggest the terminology "sequential grouping" identifier or similar instead of unique. "Unique" typically refers to one.
Ok. I think I mixed up some terminologies. Do you have any suggestion on how I can create the dataset in my second sample in the original post?
Thanks.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.