Hi...I am trying to change certain Department entries. Each record represents a course and courses may be taken from different departments. In order to identify the program that the student is enrolled in, I need to group the courses for each student to the appropiate Department which will identify the program of study. Because some courses are not counted or used as credit toward the students program, I have to identify which Department entries that are to be changed. From the code below that I have so far, the changes are being made. Any suggestion or revision would be greatly appreciated.Thanks.
data Have;
    length 'Academic Year'n $ 9 StudentUID 8 Department $ 4;
    format 'Academic Year'n $char9. StudentUID best12. Department $char4.;
    informat 'Academic Year'n $char9. StudentUID best12. Department $char4.;
    infile datalines4 dlm='7f'x missover dsd;
    input 'Academic Year'n : $char9. StudentUID : best32. Department : $char4.;
datalines4;
2017-2018 83136 CAR
2017-2018 83136 CAR
2017-2018 83136 CT
2017-2018 83136 YB
2017-2018 83136 YB
2017-2018 83136 YBC
2017-2018 83136 CAR
2017-2018 88069 NCT
2017-2018 88069 NCT
2017-2018 88069 ES
2017-2018 88069 CDCA
2018-2019 83882 MOA
2018-2019 83882 MOA
2018-2019 83882 RH
2018-2019 83882 RH
2018-2019 83882 RH
2018-2019 83882 MOA
2018-2019 83882 MOA
2018-2019 83882 MOA
2018-2019 83882 ES
2018-2019 83882 ES
2019-2020 86518 RH
2019-2020 86518 RH
2019-2020 86518 MAA
2019-2020 86518 MAA
2019-2020 86518 MAA
2019-2020 86518 ES
2019-2020 86518 ES
2019-2020 86519 IB
2019-2020 86519 IB
2019-2020 86519 IB
2019-2020 86519 LM
2019-2020 86519 LM
2019-2020 86519 IB
;;;;
proc sort data=Have;
	by 'Academic Year'n StudentUID;
run;
data Want;
	set Have;
	by 'Academic Year'n StudentUID;
   	length Department1 $ 4.;
	retain Dpt;
		if findw(Department,'YBC')>0 then do;
			Dpt = 'YBC';
			if Department in ('YBC','YB','CT','CAR') then 
					Department1 = DPT;
			else 
				Department1 = Department;
		drop Dpt;
		end;
		if findw(Department,'CDCA')>0 then do;
			Dpt = 'CDCA';
			if Department in ('CDCA','NCT') then 
					Department1 = DPT;
			else 
				Department1 = Department;
		drop Dpt;
		end;
		if findw(Department,'MOA')>0 then do;
			Dpt = 'MOA';
			if Department in ('MOA','RH') then 
					Department1 = DPT;
			else 
				Department1 = Department;
		drop Dpt;
		end;
		if findw(Department,'MAA')>0 then do;
			Dpt = 'MAA';
			if Department in ('MAA','RH') then 
					Department1 = DPT;
			else 
				Department1 = Department;
		drop Dpt;
		end;
		if findw(Department,'IB')>0 then do;
			Dpt = 'IB';
			if Department in ('IB','LM') then 
					Department1 = DPT;
			else 
				Department1 = Department;
		drop Dpt;
		end;
run;
Want:
Academic Year	StudentUID	Department	Department1
2017-2018	83136	CAR	YBC
2017-2018	83136	CAR	YBC
2017-2018	83136	CT	YBC
2017-2018	83136	YB	YBC
2017-2018	83136	YB	YBC
2017-2018	83136	YBC	YBC
2017-2018	83136	CAR	YBC
2017-2018	88069	NCT	CDCA
2017-2018	88069	NCT	CDCA
2017-2018	88069	ES	ES
2017-2018	88069	CDCA	CDCA
2018-2019	83882	MOA	MOA
2018-2019	83882	MOA	MOA
2018-2019	83882	RH	MOA
2018-2019	83882	RH	MOA
2018-2019	83882	RH	MOA
2018-2019	83882	MOA	MOA
2018-2019	83882	MOA	MOA
2018-2019	83882	MOA	MOA
2018-2019	83882	ES	ES
2018-2019	83882	ES	ES
2019-2020	86518	RH	MAA
2019-2020	86518	RH	MAA
2019-2020	86518	MAA	MAA
2019-2020	86518	MAA	MAA
2019-2020	86518	MAA	MAA
2019-2020	86518	ES	ES
2019-2020	86518	ES	ES
2019-2020	86519	IB	IB
2019-2020	86519	IB	IB
2019-2020	86519	IB	IB
2019-2020	86519	LM	IB
2019-2020	86519	LM	IB
2019-2020	86519	IB	IB
					
				
			
			
				
			
			
			
			
			
			
			
		1. The weird formatting of your code is because you use tabs.
I am a firm proponent of using spaces, so that formatting is not broken when code is moved.
2. The line
by AcademicYear StudentUID;
in your last step serves no purpose and slows down the process.
3. Can't this be done in one data step?
It looks like you are just doing a straight recode of department for certain values. If so then this simple code should work.
data want;
   set have;
   length department1 $4;
   select (department);
      when ('YBC','YB','CT','CAR') department1 = 'YBC';
      when ('CDCA','NCT') department1 = 'CDCA';
      when ('MAA','RH') department1 = 'MAA';
      when ('IB','LM') department1 = 'IB';
      otherwise department1 = department;
   end;
run;
Is this just about re-coding? If so then a format could be quite useful.
proc format;
  value $dpt
    'YBC','YB','CT','CAR' = 'YBC'
    'CDCA','NCT'          = 'CDCA'
    'MAA','RH'            = 'MAA'
    'IB','LM'             = 'IB'
  ;
run;
data want;
  set have;
  Department1=put(Department, $dpt.);
run;Hi...This is not straight forward of just re-coding. Some courses recieve credit toward other programs. For example, the course with a Department of RH receives credit toward MAA, MOA as well as RH programs. So as to know which program that the student is receiving credit for the course, I need to first find out which program that the student is enrolled in for that academic year which I am attempting to do with the statement with the findw. Then I know which department codes need to recoded.
@twildone wrote:
The posted Department1 for StudentUID=86518 is MAA. When I ran your code, I am getting both MAA and MOA for this same StudentUID.
What is the meaning of the new variable?
Your words make it sound like you want the student's department to drive the remapping. But your input does not have a student level department variable. The existing data just have the one department variable the observation level. Most of your students have more than one department listed. How do you want to pick the department for the remapping?
Student Academic Obs UID Year dept1 dept2 dept3 dept4 dept5 dept6 dept7 dept8 dept9 dept10 1 83136 2017-2018 CAR CAR CT YB YB YBC CAR 2 83882 2018-2019 MOA MOA RH RH RH MOA MOA MOA ES ES 3 86518 2019-2020 RH RH MAA MAA MAA ES ES 4 86519 2019-2020 IB IB IB LM LM IB 5 88069 2017-2018 NCT NCT ES CDCA
You only one drop statement anywhere in the data step.
Your posted data step is going to create problems. Does your real data have these problems?
1206  data posted;
1207      length 'AcademicYear'n $ 9 StudentUID 8 Department $ 4;
1208      format 'AcademicYear'n $char9. StudentUID best12. Department $char4.;
1209      informat 'AcademicYear'n $char9. StudentUID best12. Department $char4.;
1210      infile datalines4 dlm='7f'x missover dsd;
1211      input 'AcademicYear'n : $char9. StudentUID : best32. Department : $char4.;
1212  datalines4;
NOTE: Invalid data for StudentUID in line 1234 11-19.
RULE:       ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9----+
1234CHAR    2019-2020. 86518 RH
    ZONE    33332333372333332542222222222222222222222222222222222222222222222222222222222222
    NUMR    2019D2020F0865180280000000000000000000000000000000000000000000000000000000000000
'AcademicYear'n=2019-2020 StudentUID=. Department=  _ERROR_=1 _N_=22
NOTE: Invalid data for StudentUID in line 1237 11-20.
1237CHAR    2019-2020. 86518 MAA
    ZONE    33332333372333332444222222222222222222222222222222222222222222222222222222222222
    NUMR    2019D2020F0865180D11000000000000000000000000000000000000000000000000000000000000
'AcademicYear'n=2019-2020 StudentUID=. Department=  _ERROR_=1 _N_=25
NOTE: The data set WORK.POSTED has 34 observations and 3 variables.
Plus you will create 19 observations where DEPARTMENT starts with a space because you are using $CHAR informat instead of just reading the string normally.
Try using a more normal data step since non of your variables require any special informats or formats and you don't have any missing values to worry about.
data have;
    length AcademicYear $ 9 StudentUID 8 Department $ 4;
    input AcademicYear StudentUID Department;
datalines4;
2017-2018  83136  CAR
2017-2018  83136  CAR
2017-2018  83136  CT
2017-2018  83136  YB
2017-2018  83136  YB
2017-2018  83136  YBC
2017-2018  83136  CAR
2017-2018  88069  NCT
2017-2018  88069  NCT
2017-2018  88069  ES
2017-2018  88069  CDCA
2018-2019  83882  MOA
2018-2019  83882  MOA
2018-2019  83882  RH
2018-2019  83882  RH
2018-2019  83882  RH
2018-2019  83882  MOA
2018-2019  83882  MOA
2018-2019  83882  MOA
2018-2019  83882  ES
2018-2019  83882  ES
2019-2020  86518 RH
2019-2020  86518  RH
2019-2020  86518  MAA
2019-2020  86518 MAA
2019-2020  86518  MAA
2019-2020  86518  ES
2019-2020  86518  ES
2019-2020  86519  IB
2019-2020  86519  IB
2019-2020  86519  IB
2019-2020  86519  LM
2019-2020  86519  LM
2019-2020  86519  IB
;;;;
Hi Tom....thanks for your suggestion. I made the changes and have been working on this by trying different things. The code below seems to be getting closer to what I need. I used the max_Dpt approach so I can test to make sure only the student records that need to be changed are changed. By changing the YBC to YB, for example, for StudentUID 83136, then none of the Department codes should change and that is what is happening. The issue that I am still having problems with is dealing with courses that receive credit in more than one program (ie. RH can receive credit in MOA, MAA as well as RH). I thought by using the "by AcademicYear StudentUID" statement that it would take care of that problem as each student would be evaluated separately.
data have;
    length AcademicYear $ 9 StudentUID 8 Department $ 5;
    input AcademicYear StudentUID Department;
datalines4;
2017-2018  83136  CAR
2017-2018  83136  CAR
2017-2018  83136  CT
2017-2018  83136  YB
2017-2018  83136  YB
2017-2018  83136  YBC
2017-2018  83136  CAR
2017-2018  88069  NCT
2017-2018  88069  NCT
2017-2018  88069  ES
2017-2018  88069  CDCA
2018-2019  83882  MOA
2018-2019  83882  MOA
2018-2019  83882  RH
2018-2019  83882  RH
2018-2019  83882  RH
2018-2019  83882  MOA
2018-2019  83882  MOA
2018-2019  83882  MOA
2018-2019  83882  ES
2018-2019  83882  ES
2019-2020  86518 RH
2019-2020  86518  RH
2019-2020  86518  MAA
2019-2020  86518 MAA
2019-2020  86518  MAA
2019-2020  86518  ES
2019-2020  86518  ES
2019-2020  86519  IB
2019-2020  86519  IB
2019-2020  86519  IB
2019-2020  86519  LM
2019-2020  86519  LM
2019-2020  86519  IB
;;;;
proc sort data=Have;
	by AcademicYear StudentUID;
run;
data Have;
	set Have;
		if Department in ('YBC','CDCA','MOA','MAA','IB') then 
			Dpt = 1;
run;
data Have(drop=Dpt);  
   max_Dpt=.;
   do until (last.StudentUID);
      set Have;
      by AcademicYear StudentUID;
      if Dpt gt max_Dpt then max_Dpt=Dpt;
   end;
   do until (last.StudentUID);
      set Have;
      by AcademicYear StudentUID;
      output;
   end;
   rename max_Dpt=Dpt1;
run;
data Want(drop=Dpt1);
	set Have;
	by AcademicYear StudentUID;
   	length Department1 $ 5.;
			if Dpt1=1 and Department in ('YBC','YB','CT','CAR') then 
					Department1 = 'YBC';
			else 
			if Dpt1=1 and Department in ('CDCA','NCT') then 
					Department1 = 'CDCA';
			else 
			if Dpt1=1 and Department in ('MOA','RH') then 
					Department1 = 'MOA';
			else 
			if Dpt1=1 and Department in ('MAA','RH') then 
					Department1 = 'MAA';
			else 
			if Dpt1=1 and Department in ('IB','LM') then 
					Department1 = 'IB';
			else 
				Department1 = Department;
run;
					
				
			
			
				
			
			
			
			
			
			
			
		Hi Tom.....I think I might have got this to work. Thanks
data Have;
	set Have;
		if Department in ('YBC') then
			Dpt = 1;
		if Department in ('CDCA') then
			Dpt = 2;
		if Department in ('MOA') then
			Dpt = 3;
		if Department in ('MAA') then
			Dpt = 4;
		if Department in ('IB') then
			Dpt = 5;
run;
data Have(drop=Dpt);  
   max_Dpt=.;
   do until (last.StudentUID);
      set Have;
      by AcademicYear StudentUID;
      if Dpt gt max_Dpt then max_Dpt=Dpt;
   end;
   do until (last.StudentUID);
      set Have;
      by AcademicYear StudentUID;
      output;
   end;
   rename max_Dpt=Dpt1;
run; 
data Want(drop=Dpt1);
	set Have;
	by AcademicYear StudentUID;
   	length Department1 $ 5.;
			if Dpt1=1 and Department in ('YBC','YB','CT','CAR') then 
					Department1 = 'YBC';
			else 
			if Dpt1=2 and Department in ('CDCA','NCT') then 
					Department1 = 'CDCA';
			else 
			if Dpt1=3 and Department in ('MOA','RH') then 
					Department1 = 'MOA';
			else 
			if Dpt1=4 and Department in ('MAA','RH') then 
					Department1 = 'MAA';
			else 
			if Dpt1=5 and Department in ('IB','LM') then 
					Department1 = 'IB';
			else 
				Department1 = Department;
run;
					
				
			
			
				
			
			
			
			
			
			
			
		1. The weird formatting of your code is because you use tabs.
I am a firm proponent of using spaces, so that formatting is not broken when code is moved.
2. The line
by AcademicYear StudentUID;
in your last step serves no purpose and slows down the process.
3. Can't this be done in one data step?
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.
