SAS Data Integration Studio, DataFlux Data Management Studio, SAS/ACCESS, SAS Data Loader for Hadoop and others

Differentiate between identical variables with sequential IDs

Accepted Solution Solved
Reply
Contributor
Posts: 45
Accepted Solution

Differentiate between identical variables with sequential IDs

[ Edited ]

Hello all,

 

I am attempting to differentiate between the first variable in a group and the second. Below is an example of what I want to do. I have the ID variable and I have the type variable. I need the First variable. In my data CC and RC go together and I need to tell the difference between the first one in the group and the second. So far I've looked into last. and first. but that didn't look like it would fit my needs. I don't care about the R and T type at the moment. I don't mind if it doesn't look exactly like the example as long as I can tell the difference between the first group and last. Thanks

 

 

ID Type First
1 r  
2 r  
3 t  
4 t  
5 cc 1
6 cc 0
7 r  
8 t  
9 rr 1
10 rr 0
11 t  
12 t  
13 t  
14 cc 1
15 cc 0

 

data temp;
input ID type $;
datalines;
1 r
2 r
3 r
4 t
5 CC
6 CC
7 r
8 r
9 RR
10 RR
11 r
12 t
13 t
14 CC
15 CC
;
run;


Accepted Solutions
Solution
‎07-26-2016 02:51 PM
Super User
Super User
Posts: 6,495

Re: Differentiate between identical variables with sequential IDs

I would define the groups first and then you can use BY group processing on the new group variable.

data groups;
  set temp;
  group+(lag(type) ne type)or(lag(id)+1 ne id);
run;
data want ;
  set groups;
  by group;
  if type in ('CC','RR') then do;
     if first.group and last.group then first='ERROR';
     else if first.group then first='1';
     else first='0';
  end;
run;

Capture.PNG

View solution in original post


All Replies
Super User
Posts: 6,927

Re: Differentiate between identical variables with sequential IDs

This delivers your intended result:

data temp;
input ID type $;
datalines;
1 r
2 r
3 r
4 t
5 CC
6 CC
7 r
8 r
9 RR
10 RR
11 r
12 t
13 t
14 CC
15 CC
;
run;

data want;
set temp;
by type notsorted;
if type in ('CC','RR') then first = first.type;
run;
---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Contributor
Posts: 45

Re: Differentiate between identical variables with sequential IDs

[ Edited ]

Thanks a lot, that almost has the desired affect. There are a few things that make the data a little tricky I noticed after scanning through the results on my main dataset. I need to differentiate between types that don't have a sequential ID and indicate an error for types that aren't grouped with a like type. 

 

data temp;
input ID type $;
datalines;
1 r
2 r
3 r
4 t
5 CC
6 CC
10 CC
11 CC
12 r
19 r
20 RR
21 RR
30 r
32 t
39 t
41 CC
50 t
56 r
;
run;

 

 

ID  TYPE FIRST
1 r  
2 r  
3 r  
4 t  
5 CC 1
6 CC 0
10 CC 1
11 CC 0
12 r  
19 r  
20 RR 1
21 RR 0
30 r  
32 t  
39 t  
41 CC Need to indicate this is an error
50 t  
56 r  



 

Super User
Super User
Posts: 6,495

Re: Differentiate between identical variables with sequential IDs

[ Edited ]

I do not see a pattern in the FIRST column. You need to do a better job of explaining what you want.

If you want to treat changes in the value of TYPE as indicating the beginning of a new group then you can use BY processing. So the first three records all have type='r'. Then there is one record with type='t'. etc.

 

The data does not need to be sorted since you can use the NOTSORTED keyword on the BY statement.

data want;
  set temp;
  by type notsorted;
  groupno+first.type;
  if first.type then recno=0;
  recno+1;
run;

Capture.PNG

 

 

Contributor
Posts: 45

Re: Differentiate between identical variables with sequential IDs

[ Edited ]

Hopefully this makes more sense. The ID variable has to be sequential for the TYPE to be in the same group. So the ID 5 and 6 need to be grouped together but are not in the same group as 10 and 11. I don't really need the GROUP variable but hopefully it makes things more clear. 

 

ID  TYPE group FIRST
1 r    
2 r    
3 r    
4 t    
5 CC 1 1
6 CC 1 0
10 CC 2 1
11 CC 2 0
12 r    
19 r    
21 RR 3 1
21 RR 3 0
30 r    
32 t    
39 t    
41 CC error Need to indicate this is an error
50 t    
56 r    

 

Solution
‎07-26-2016 02:51 PM
Super User
Super User
Posts: 6,495

Re: Differentiate between identical variables with sequential IDs

I would define the groups first and then you can use BY group processing on the new group variable.

data groups;
  set temp;
  group+(lag(type) ne type)or(lag(id)+1 ne id);
run;
data want ;
  set groups;
  by group;
  if type in ('CC','RR') then do;
     if first.group and last.group then first='ERROR';
     else if first.group then first='1';
     else first='0';
  end;
run;

Capture.PNG

Contributor
Posts: 45

Re: Differentiate between identical variables with sequential IDs

Thanks! This got me to a solution.

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 6 replies
  • 357 views
  • 2 likes
  • 3 in conversation