BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
rj997
Fluorite | Level 6

Hey!

I have a dataset that looks somewhat like this:

 

EMPLID CURRENT_COLLEGE 

1                     ABC

1                     ABC

1                     XYZ

1                     XYZ

2                     CED

2                     VIN

2                     VIN

 

I have to compare the values for each row in current_college and create a separate column called 'Change' that captures if the college for each ID has changed compared to the first observation. The output table should look something like this:

 

EMPLID CURRENT_COLLEGE  CHANGE

1                     ABC                          0

1                     ABC                          0

1                     XYZ                           1

1                     XYZ                           1

2                     CED                          0

2                     VIN                            1

2                     VIN                            1

 

I have been scratching my head over this for hours and I have just started working with SAS! Are there any efficient ways of grouping observations in SAS (like 'group' in dplyr package in R)?

Thanks 🙂

1 ACCEPTED SOLUTION

Accepted Solutions
r_behata
Barite | Level 11
data want;
	set have;
	by EMPLID;
	retain CURRENT_COLLEGE_ ;

	if first.EMPLID then do;
		CURRENT_COLLEGE_=CURRENT_COLLEGE;
		CHANGE=0;
	end;
	else do;
		 CHANGE=(CURRENT_COLLEGE ^ =CURRENT_COLLEGE_);		
	end;

	drop CURRENT_COLLEGE_;
run;

View solution in original post

5 REPLIES 5
novinosrin
Tourmaline | Level 20
data have;
input (EMPLID CURRENT_COLLEGE ) ($);
cards;

1                     ABC

1                     ABC

1                     XYZ

1                     XYZ
1                     kiy

1                     kiy

2                     CED

2                     VIN

2                     VIN
;

data want;
 do _n_=0 by 1 until(last.emplid);
  do until(last.CURRENT_COLLEGE);
   set have;
   by EMPLID CURRENT_COLLEGE;
   if _n_ in (0,1) then change=_n_;
   output;
  end;
 end;
run;

proc print noobs;run;
novinosrin
Tourmaline | Level 20

Hi @rj997  By the way, An ideal design should have incremental value for the change variable each time the change occurs. I have tweaked your sample with another set of group 

1 kiy

1 kiy

The change should increment to 2 and so on and so forth.

 

If that makes sense, the above code is tweak to sum

 

data have;
input (EMPLID CURRENT_COLLEGE ) ($);
cards;

1                     ABC

1                     ABC

1                     XYZ

1                     XYZ
1                     kiy

1                     kiy

2                     CED

2                     VIN

2                     VIN
;

data want;
 do _n_=0 by 1 until(last.emplid);
  do until(last.CURRENT_COLLEGE);
   set have;
   by EMPLID CURRENT_COLLEGE;
   change=_n_;;
   output;
  end;
 end;
run;

 

rj997
Fluorite | Level 6

Thanks for the help!

Although, I am just trying to capture if the college values have changed from the first observation and record that with binary values (of 0 or 1).

r_behata
Barite | Level 11
data want;
	set have;
	by EMPLID;
	retain CURRENT_COLLEGE_ ;

	if first.EMPLID then do;
		CURRENT_COLLEGE_=CURRENT_COLLEGE;
		CHANGE=0;
	end;
	else do;
		 CHANGE=(CURRENT_COLLEGE ^ =CURRENT_COLLEGE_);		
	end;

	drop CURRENT_COLLEGE_;
run;
mkeintz
PROC Star

This is a situation in which you should consider taking advantage of a property of "conditional SET statements" - namely that any variable read by a SET statement is automatically retained until that SET (or an "overlapping" SET) is executed again:

 

data have;
input (EMPLID CURRENT_COLLEGE ) ($);
cards;
1                     ABC
1                     ABC
1                     XYZ
1                     XYZ
2                     CED
2                     VIN
2                     VIN
;

data want;
  set have;
  by emplid;
  if first.emplid then set have (keep=current_college rename=(current_college=college1)) point=_n_;
  change=(current_college^=college1);
run;

Whenever the beginning of an ID is encountered, that observation is re-read via a SET with a "POINT=" option.  Because the current_college variable is renamed to college1, the college1 value is automatically retained until the beginning of the next ID.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 2723 views
  • 5 likes
  • 4 in conversation