BookmarkSubscribeRSS Feed
trevand
Obsidian | Level 7

I would like to create a new identifier (new_id) that identifies a new subgroup whenever the indicator switches from 0 to 1. Has some and idea how to accomplish this? 

 

group_id indicator new_id
1 1 1a
1 0 1a
1 0 1a
1 1 1b
1 0 1b
1 0 1b
1 0 1b
2 1 2a
2 0 2a
4 REPLIES 4
PaigeMiller
Diamond | Level 26

Much much much easier to create sequential numbers.

 

data want;
    set have;
    if indicator=1 then new_id+1;
run;

 

I'm pretty sure anything you want to do with this NEW_ID variable can be done with the sequential numbers.

--
Paige Miller
LinusH
Tourmaline | Level 20

Agree with @PaigeMiller sequences are easier to maintain/use.

If you need the group_id  within the new variable you can try this:

data have;
	length group_id indicator 8;
	input group_id indicator;
	datalines;
1 1
1 0
1 0
1 1
1 0
1 0
1 0
2 1
2 0
;
run;

data want;
	set have;
	by group_id;
	if first.group_id then index = 0;
	if indicator = 1 then index+1;
	new_id = cats(put(group_id,best.),'-',put(index,best.));
run;

But to get the result you need, try this:

data want;
	set have;
	by group_id;
	if first.group_id then index = 0;
	if indicator = 1 then index+1;
	new_id = cats(put(group_id,best.),substr('abcdefghijklmnopqrstuvwxyz',index,1));
run;

It was not clear from your example how many times the indicator can change, but I guess the maximum with this technique  would be 26.

 

 

Data never sleeps
PaigeMiller
Diamond | Level 26

@LinusH wrote:

Agree with @PaigeMiller sequences are easier to maintain/use.

 

It was not clear from your example how many times the indicator can change, but I guess the maximum with this technique  would be 26.


Good point. That's a problem if you need more than 26 letters. Another problem is that if the variable GROUP_ID goes above 9, then the next value of NEW_ID is 10a. Since you also have 1a, these will not sort properly. If you sort on these NEW_ID values, 10a, 10b, ... will be followed by 1a. And if GROUP_ID goes above 9, you will also need a LENGTH statement to make this work at all.

 

Plus I am skeptical that there is a valid use for identifiers like 1a and 10a and so on. Who can remember what they represent when looking at the output? I also worry that the OP has oversimplified a real world problem and any answers we give will not fit the real world problem.

--
Paige Miller
PaigeMiller
Diamond | Level 26

@trevand 

please tell us what you are going to do with variable NEW_ID

--
Paige Miller