Solved: Comparing Rows Within A Group

rj997 · Posted 03-10-2020 10:47 AM

Hey!

I have a dataset that looks somewhat like this:

EMPLID CURRENT_COLLEGE

1 ABC

1 XYZ

2 CED

2 VIN

I have to compare the values for each row in current_college and create a separate column called 'Change' that captures if the college for each ID has changed compared to the first observation. The output table should look something like this:

EMPLID CURRENT_COLLEGE CHANGE

1 ABC 0

1 XYZ 1

2 CED 0

2 VIN 1

I have been scratching my head over this for hours and I have just started working with SAS! Are there any efficient ways of grouping observations in SAS (like 'group' in dplyr package in R)?

Thanks 🙂

r_behata · Posted 03-10-2020 11:25 AM

data want;
	set have;
	by EMPLID;
	retain CURRENT_COLLEGE_ ;

	if first.EMPLID then do;
		CURRENT_COLLEGE_=CURRENT_COLLEGE;
		CHANGE=0;
	end;
	else do;
		 CHANGE=(CURRENT_COLLEGE ^ =CURRENT_COLLEGE_);		
	end;

	drop CURRENT_COLLEGE_;
run;

View solution in original post

novinosrin · Posted 03-10-2020 11:11 AM

data have;
input (EMPLID CURRENT_COLLEGE ) ($);
cards;

1                     ABC

1                     ABC

1                     XYZ

1                     XYZ
1                     kiy

1                     kiy

2                     CED

2                     VIN

2                     VIN
;

data want;
 do _n_=0 by 1 until(last.emplid);
  do until(last.CURRENT_COLLEGE);
   set have;
   by EMPLID CURRENT_COLLEGE;
   if _n_ in (0,1) then change=_n_;
   output;
  end;
 end;
run;

proc print noobs;run;

novinosrin · Posted 03-10-2020 11:18 AM

Hi @rj997 By the way, An ideal design should have incremental value for the change variable each time the change occurs. I have tweaked your sample with another set of group

1 kiy

The change should increment to 2 and so on and so forth.

If that makes sense, the above code is tweak to sum

data have;
input (EMPLID CURRENT_COLLEGE ) ($);
cards;

1                     ABC

1                     ABC

1                     XYZ

1                     XYZ
1                     kiy

1                     kiy

2                     CED

2                     VIN

2                     VIN
;

data want;
 do _n_=0 by 1 until(last.emplid);
  do until(last.CURRENT_COLLEGE);
   set have;
   by EMPLID CURRENT_COLLEGE;
   change=_n_;;
   output;
  end;
 end;
run;

rj997 · Posted 03-10-2020 11:47 AM

Thanks for the help!

Although, I am just trying to capture if the college values have changed from the first observation and record that with binary values (of 0 or 1).

r_behata · Posted 03-10-2020 11:25 AM

data want;
	set have;
	by EMPLID;
	retain CURRENT_COLLEGE_ ;

	if first.EMPLID then do;
		CURRENT_COLLEGE_=CURRENT_COLLEGE;
		CHANGE=0;
	end;
	else do;
		 CHANGE=(CURRENT_COLLEGE ^ =CURRENT_COLLEGE_);		
	end;

	drop CURRENT_COLLEGE_;
run;

mkeintz · Posted 03-10-2020 11:50 AM

This is a situation in which you should consider taking advantage of a property of "conditional SET statements" - namely that any variable read by a SET statement is automatically retained until that SET (or an "overlapping" SET) is executed again:

data have;
input (EMPLID CURRENT_COLLEGE ) ($);
cards;
1                     ABC
1                     ABC
1                     XYZ
1                     XYZ
2                     CED
2                     VIN
2                     VIN
;

data want;
  set have;
  by emplid;
  if first.emplid then set have (keep=current_college rename=(current_college=college1)) point=_n_;
  change=(current_college^=college1);
run;

Whenever the beginning of an ID is encountered, that observation is re-read via a SET with a "POINT=" option. Because the current_college variable is renamed to college1, the college1 value is automatically retained until the beginning of the next ID.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

Comparing Rows Within A Group

Re: Comparing Rows Within A Group

Re: Comparing Rows Within A Group

Re: Comparing Rows Within A Group

Re: Comparing Rows Within A Group

Re: Comparing Rows Within A Group

Re: Comparing Rows Within A Group

Comparing Rows Within A Group

Re: Comparing Rows Within A Group

Re: Comparing Rows Within A Group

Re: Comparing Rows Within A Group

Re: Comparing Rows Within A Group

Re: Comparing Rows Within A Group

Re: Comparing Rows Within A Group

Registration is open

SAS Training: Just a Click Away