BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
rj997
Fluorite | Level 6

Hey!

I have a dataset that looks somewhat like this:

 

EMPLID CURRENT_COLLEGE 

1                     ABC

1                     ABC

1                     XYZ

1                     XYZ

2                     CED

2                     VIN

2                     VIN

 

I have to compare the values for each row in current_college and create a separate column called 'Change' that captures if the college for each ID has changed compared to the first observation. The output table should look something like this:

 

EMPLID CURRENT_COLLEGE  CHANGE

1                     ABC                          0

1                     ABC                          0

1                     XYZ                           1

1                     XYZ                           1

2                     CED                          0

2                     VIN                            1

2                     VIN                            1

 

I have been scratching my head over this for hours and I have just started working with SAS! Are there any efficient ways of grouping observations in SAS (like 'group' in dplyr package in R)?

Thanks 🙂

1 ACCEPTED SOLUTION

Accepted Solutions
r_behata
Barite | Level 11
data want;
	set have;
	by EMPLID;
	retain CURRENT_COLLEGE_ ;

	if first.EMPLID then do;
		CURRENT_COLLEGE_=CURRENT_COLLEGE;
		CHANGE=0;
	end;
	else do;
		 CHANGE=(CURRENT_COLLEGE ^ =CURRENT_COLLEGE_);		
	end;

	drop CURRENT_COLLEGE_;
run;

View solution in original post

5 REPLIES 5
novinosrin
Tourmaline | Level 20
data have;
input (EMPLID CURRENT_COLLEGE ) ($);
cards;

1                     ABC

1                     ABC

1                     XYZ

1                     XYZ
1                     kiy

1                     kiy

2                     CED

2                     VIN

2                     VIN
;

data want;
 do _n_=0 by 1 until(last.emplid);
  do until(last.CURRENT_COLLEGE);
   set have;
   by EMPLID CURRENT_COLLEGE;
   if _n_ in (0,1) then change=_n_;
   output;
  end;
 end;
run;

proc print noobs;run;
novinosrin
Tourmaline | Level 20

Hi @rj997  By the way, An ideal design should have incremental value for the change variable each time the change occurs. I have tweaked your sample with another set of group 

1 kiy

1 kiy

The change should increment to 2 and so on and so forth.

 

If that makes sense, the above code is tweak to sum

 

data have;
input (EMPLID CURRENT_COLLEGE ) ($);
cards;

1                     ABC

1                     ABC

1                     XYZ

1                     XYZ
1                     kiy

1                     kiy

2                     CED

2                     VIN

2                     VIN
;

data want;
 do _n_=0 by 1 until(last.emplid);
  do until(last.CURRENT_COLLEGE);
   set have;
   by EMPLID CURRENT_COLLEGE;
   change=_n_;;
   output;
  end;
 end;
run;

 

rj997
Fluorite | Level 6

Thanks for the help!

Although, I am just trying to capture if the college values have changed from the first observation and record that with binary values (of 0 or 1).

r_behata
Barite | Level 11
data want;
	set have;
	by EMPLID;
	retain CURRENT_COLLEGE_ ;

	if first.EMPLID then do;
		CURRENT_COLLEGE_=CURRENT_COLLEGE;
		CHANGE=0;
	end;
	else do;
		 CHANGE=(CURRENT_COLLEGE ^ =CURRENT_COLLEGE_);		
	end;

	drop CURRENT_COLLEGE_;
run;
mkeintz
PROC Star

This is a situation in which you should consider taking advantage of a property of "conditional SET statements" - namely that any variable read by a SET statement is automatically retained until that SET (or an "overlapping" SET) is executed again:

 

data have;
input (EMPLID CURRENT_COLLEGE ) ($);
cards;
1                     ABC
1                     ABC
1                     XYZ
1                     XYZ
2                     CED
2                     VIN
2                     VIN
;

data want;
  set have;
  by emplid;
  if first.emplid then set have (keep=current_college rename=(current_college=college1)) point=_n_;
  change=(current_college^=college1);
run;

Whenever the beginning of an ID is encountered, that observation is re-read via a SET with a "POINT=" option.  Because the current_college variable is renamed to college1, the college1 value is automatically retained until the beginning of the next ID.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 2655 views
  • 5 likes
  • 4 in conversation