BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
twildone
Pyrite | Level 9

Hi...I am trying to change certain Department entries. Each record represents a course and courses may be taken from different departments. In order to identify the program that the student is enrolled in, I need to group the courses for each student to the appropiate Department which will identify the program of study. Because some courses are not counted or used as credit toward the students program, I have to identify which Department entries that are to be changed. From the code below that I have so far, the changes are being made. Any suggestion or revision would be greatly appreciated.Thanks. 

 

data Have;
    length 'Academic Year'n $ 9 StudentUID 8 Department $ 4;
    format 'Academic Year'n $char9. StudentUID best12. Department $char4.;
    informat 'Academic Year'n $char9. StudentUID best12. Department $char4.;
    infile datalines4 dlm='7f'x missover dsd;
    input 'Academic Year'n : $char9. StudentUID : best32. Department : $char4.;
datalines4;
2017-2018 83136 CAR
2017-2018 83136 CAR
2017-2018 83136 CT
2017-2018 83136 YB
2017-2018 83136 YB
2017-2018 83136 YBC
2017-2018 83136 CAR
2017-2018 88069 NCT
2017-2018 88069 NCT
2017-2018 88069 ES
2017-2018 88069 CDCA
2018-2019 83882 MOA
2018-2019 83882 MOA
2018-2019 83882 RH
2018-2019 83882 RH
2018-2019 83882 RH
2018-2019 83882 MOA
2018-2019 83882 MOA
2018-2019 83882 MOA
2018-2019 83882 ES
2018-2019 83882 ES
2019-2020 86518 RH
2019-2020 86518 RH
2019-2020 86518 MAA
2019-2020 86518 MAA
2019-2020 86518 MAA
2019-2020 86518 ES
2019-2020 86518 ES
2019-2020 86519 IB
2019-2020 86519 IB
2019-2020 86519 IB
2019-2020 86519 LM
2019-2020 86519 LM
2019-2020 86519 IB
;;;;

proc sort data=Have;
	by 'Academic Year'n StudentUID;
run;

data Want;
	set Have;
	by 'Academic Year'n StudentUID;
   	length Department1 $ 4.;
	retain Dpt;
		if findw(Department,'YBC')>0 then do;
			Dpt = 'YBC';
			if Department in ('YBC','YB','CT','CAR') then 
					Department1 = DPT;
			else 
				Department1 = Department;
		drop Dpt;
		end;
		if findw(Department,'CDCA')>0 then do;
			Dpt = 'CDCA';
			if Department in ('CDCA','NCT') then 
					Department1 = DPT;
			else 
				Department1 = Department;
		drop Dpt;
		end;
		if findw(Department,'MOA')>0 then do;
			Dpt = 'MOA';
			if Department in ('MOA','RH') then 
					Department1 = DPT;
			else 
				Department1 = Department;
		drop Dpt;
		end;
		if findw(Department,'MAA')>0 then do;
			Dpt = 'MAA';
			if Department in ('MAA','RH') then 
					Department1 = DPT;
			else 
				Department1 = Department;
		drop Dpt;
		end;
		if findw(Department,'IB')>0 then do;
			Dpt = 'IB';
			if Department in ('IB','LM') then 
					Department1 = DPT;
			else 
				Department1 = Department;
		drop Dpt;
		end;
run;

Want:
Academic Year	StudentUID	Department	Department1
2017-2018	83136	CAR	YBC
2017-2018	83136	CAR	YBC
2017-2018	83136	CT	YBC
2017-2018	83136	YB	YBC
2017-2018	83136	YB	YBC
2017-2018	83136	YBC	YBC
2017-2018	83136	CAR	YBC
2017-2018	88069	NCT	CDCA
2017-2018	88069	NCT	CDCA
2017-2018	88069	ES	ES
2017-2018	88069	CDCA	CDCA
2018-2019	83882	MOA	MOA
2018-2019	83882	MOA	MOA
2018-2019	83882	RH	MOA
2018-2019	83882	RH	MOA
2018-2019	83882	RH	MOA
2018-2019	83882	MOA	MOA
2018-2019	83882	MOA	MOA
2018-2019	83882	MOA	MOA
2018-2019	83882	ES	ES
2018-2019	83882	ES	ES
2019-2020	86518	RH	MAA
2019-2020	86518	RH	MAA
2019-2020	86518	MAA	MAA
2019-2020	86518	MAA	MAA
2019-2020	86518	MAA	MAA
2019-2020	86518	ES	ES
2019-2020	86518	ES	ES
2019-2020	86519	IB	IB
2019-2020	86519	IB	IB
2019-2020	86519	IB	IB
2019-2020	86519	LM	IB
2019-2020	86519	LM	IB
2019-2020	86519	IB	IB
1 ACCEPTED SOLUTION

Accepted Solutions
ChrisNZ
Tourmaline | Level 20

1. The weird formatting of your code is because you use tabs.

I am a firm proponent of using spaces, so that formatting is not broken when code is moved.

 

2. The line

	by AcademicYear StudentUID;

in your last step serves no purpose and slows down the process.

 

3. Can't this be done in one data step?

View solution in original post

11 REPLIES 11
qatman28
Obsidian | Level 7

It looks like you are just doing a straight recode of department for certain values.  If so then this simple code should work.

 

data want;
   set have;
   length department1 $4;

   select (department);
      when ('YBC','YB','CT','CAR') department1 = 'YBC';
      when ('CDCA','NCT') department1 = 'CDCA';
      when ('MAA','RH') department1 = 'MAA';
      when ('IB','LM') department1 = 'IB';
      otherwise department1 = department;
   end;

run;

 

Patrick
Opal | Level 21

Is this just about re-coding? If so then a format could be quite useful.

proc format;
  value $dpt
    'YBC','YB','CT','CAR' = 'YBC'
    'CDCA','NCT'          = 'CDCA'
    'MAA','RH'            = 'MAA'
    'IB','LM'             = 'IB'
  ;
run;

data want;
  set have;
  Department1=put(Department, $dpt.);
run;
twildone
Pyrite | Level 9

Hi...This is not straight forward of just re-coding. Some courses recieve credit toward other programs. For example, the course with a Department of RH receives credit toward MAA, MOA as well as RH programs. So as to know which program that the student is receiving credit for the course, I need to first find out which program that the student is enrolled in for that academic year which I am attempting to do with the statement with the findw. Then I know which department codes need to recoded.

qatman28
Obsidian | Level 7
The posted code produces a variable "department1" per your wanted output. If you have a separate question then please ask, otherwise it appears that the code from Patrick or myself works.
twildone
Pyrite | Level 9
The posted Department1 for StudentUID=86518 is MAA. When I ran your code, I am getting both MAA and MOA for this same StudentUID.
Tom
Super User Tom
Super User

@twildone wrote:
The posted Department1 for StudentUID=86518 is MAA. When I ran your code, I am getting both MAA and MOA for this same StudentUID.

What is the meaning of the new variable?  

Your words make it sound like you want the student's department to drive the remapping.  But your input does not have a student level department variable.  The existing data just have the one department variable the observation level.  Most of your students have more than one department listed.  How do you want to pick the department for the remapping?

      Student   Academic
Obs     UID       Year      dept1   dept2   dept3   dept4   dept5   dept6   dept7   dept8   dept9   dept10

 1     83136    2017-2018    CAR     CAR     CT     YB       YB      YBC     CAR
 2     83882    2018-2019    MOA     MOA     RH     RH       RH      MOA     MOA     MOA     ES       ES
 3     86518    2019-2020    RH      RH      MAA    MAA      MAA     ES      ES
 4     86519    2019-2020    IB      IB      IB     LM       LM      IB
 5     88069    2017-2018    NCT     NCT     ES     CDCA

 

ChrisNZ
Tourmaline | Level 20

You only one drop statement anywhere in the data step. 

Tom
Super User Tom
Super User

Your posted data step is going to create problems.  Does your real data have these problems?

1206  data posted;
1207      length 'AcademicYear'n $ 9 StudentUID 8 Department $ 4;
1208      format 'AcademicYear'n $char9. StudentUID best12. Department $char4.;
1209      informat 'AcademicYear'n $char9. StudentUID best12. Department $char4.;
1210      infile datalines4 dlm='7f'x missover dsd;
1211      input 'AcademicYear'n : $char9. StudentUID : best32. Department : $char4.;
1212  datalines4;

NOTE: Invalid data for StudentUID in line 1234 11-19.
RULE:       ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9----+

1234CHAR    2019-2020. 86518 RH
    ZONE    33332333372333332542222222222222222222222222222222222222222222222222222222222222
    NUMR    2019D2020F0865180280000000000000000000000000000000000000000000000000000000000000
'AcademicYear'n=2019-2020 StudentUID=. Department=  _ERROR_=1 _N_=22
NOTE: Invalid data for StudentUID in line 1237 11-20.

1237CHAR    2019-2020. 86518 MAA
    ZONE    33332333372333332444222222222222222222222222222222222222222222222222222222222222
    NUMR    2019D2020F0865180D11000000000000000000000000000000000000000000000000000000000000
'AcademicYear'n=2019-2020 StudentUID=. Department=  _ERROR_=1 _N_=25
NOTE: The data set WORK.POSTED has 34 observations and 3 variables.

Plus you will create 19 observations where DEPARTMENT starts with a space because you are using $CHAR informat instead of just reading the string normally.

Try using a more normal data step since non of your variables require any special informats or formats and you don't have any missing values to worry about.

data have;
    length AcademicYear $ 9 StudentUID 8 Department $ 4;
    input AcademicYear StudentUID Department;
datalines4;
2017-2018  83136  CAR
2017-2018  83136  CAR
2017-2018  83136  CT
2017-2018  83136  YB
2017-2018  83136  YB
2017-2018  83136  YBC
2017-2018  83136  CAR
2017-2018  88069  NCT
2017-2018  88069  NCT
2017-2018  88069  ES
2017-2018  88069  CDCA
2018-2019  83882  MOA
2018-2019  83882  MOA
2018-2019  83882  RH
2018-2019  83882  RH
2018-2019  83882  RH
2018-2019  83882  MOA
2018-2019  83882  MOA
2018-2019  83882  MOA
2018-2019  83882  ES
2018-2019  83882  ES
2019-2020  86518 RH
2019-2020  86518  RH
2019-2020  86518  MAA
2019-2020  86518 MAA
2019-2020  86518  MAA
2019-2020  86518  ES
2019-2020  86518  ES
2019-2020  86519  IB
2019-2020  86519  IB
2019-2020  86519  IB
2019-2020  86519  LM
2019-2020  86519  LM
2019-2020  86519  IB
;;;;

 

twildone
Pyrite | Level 9

Hi Tom....thanks for your suggestion. I made the changes and have been working on this by trying different things. The code below seems to be getting closer to what I need. I used the max_Dpt approach so I can test to make sure only the student records that need to be changed are changed. By changing the YBC to YB, for example, for StudentUID 83136, then none of the Department codes should change and that is what is happening. The issue that I am still having problems with is dealing with courses that receive credit in more than one program (ie. RH can receive credit in MOA, MAA as well as RH). I thought by using the  "by AcademicYear StudentUID" statement that it would take care of that problem as each student would be evaluated separately.

 

data have;
    length AcademicYear $ 9 StudentUID 8 Department $ 5;
    input AcademicYear StudentUID Department;
datalines4;
2017-2018  83136  CAR
2017-2018  83136  CAR
2017-2018  83136  CT
2017-2018  83136  YB
2017-2018  83136  YB
2017-2018  83136  YBC
2017-2018  83136  CAR
2017-2018  88069  NCT
2017-2018  88069  NCT
2017-2018  88069  ES
2017-2018  88069  CDCA
2018-2019  83882  MOA
2018-2019  83882  MOA
2018-2019  83882  RH
2018-2019  83882  RH
2018-2019  83882  RH
2018-2019  83882  MOA
2018-2019  83882  MOA
2018-2019  83882  MOA
2018-2019  83882  ES
2018-2019  83882  ES
2019-2020  86518 RH
2019-2020  86518  RH
2019-2020  86518  MAA
2019-2020  86518 MAA
2019-2020  86518  MAA
2019-2020  86518  ES
2019-2020  86518  ES
2019-2020  86519  IB
2019-2020  86519  IB
2019-2020  86519  IB
2019-2020  86519  LM
2019-2020  86519  LM
2019-2020  86519  IB
;;;;

proc sort data=Have;
	by AcademicYear StudentUID;
run;

data Have;
	set Have;
		if Department in ('YBC','CDCA','MOA','MAA','IB') then 
			Dpt = 1;
run;

data Have(drop=Dpt);  
   max_Dpt=.;
   do until (last.StudentUID);
      set Have;
      by AcademicYear StudentUID;
      if Dpt gt max_Dpt then max_Dpt=Dpt;
   end;
   do until (last.StudentUID);
      set Have;
      by AcademicYear StudentUID;
      output;
   end;
   rename max_Dpt=Dpt1;
run;

data Want(drop=Dpt1);
	set Have;
	by AcademicYear StudentUID;
   	length Department1 $ 5.;
			if Dpt1=1 and Department in ('YBC','YB','CT','CAR') then 
					Department1 = 'YBC';
			else 
			if Dpt1=1 and Department in ('CDCA','NCT') then 
					Department1 = 'CDCA';
			else 
			if Dpt1=1 and Department in ('MOA','RH') then 
					Department1 = 'MOA';
			else 
			if Dpt1=1 and Department in ('MAA','RH') then 
					Department1 = 'MAA';
			else 
			if Dpt1=1 and Department in ('IB','LM') then 
					Department1 = 'IB';
			else 
				Department1 = Department;
run;
twildone
Pyrite | Level 9

Hi Tom.....I think I might have got this to work. Thanks

 

data Have;
	set Have;
		if Department in ('YBC') then
			Dpt = 1;
		if Department in ('CDCA') then
			Dpt = 2;
		if Department in ('MOA') then
			Dpt = 3;
		if Department in ('MAA') then
			Dpt = 4;
		if Department in ('IB') then
			Dpt = 5;
run;

data Have(drop=Dpt);  
   max_Dpt=.;
   do until (last.StudentUID);
      set Have;
      by AcademicYear StudentUID;
      if Dpt gt max_Dpt then max_Dpt=Dpt;
   end;
   do until (last.StudentUID);
      set Have;
      by AcademicYear StudentUID;
      output;
   end;
   rename max_Dpt=Dpt1;
run; 

data Want(drop=Dpt1);
	set Have;
	by AcademicYear StudentUID;
   	length Department1 $ 5.;
			if Dpt1=1 and Department in ('YBC','YB','CT','CAR') then 
					Department1 = 'YBC';
			else 
			if Dpt1=2 and Department in ('CDCA','NCT') then 
					Department1 = 'CDCA';
			else 
			if Dpt1=3 and Department in ('MOA','RH') then 
					Department1 = 'MOA';
			else 
			if Dpt1=4 and Department in ('MAA','RH') then 
					Department1 = 'MAA';
			else 
			if Dpt1=5 and Department in ('IB','LM') then 
					Department1 = 'IB';
			else 
				Department1 = Department;
run;
ChrisNZ
Tourmaline | Level 20

1. The weird formatting of your code is because you use tabs.

I am a firm proponent of using spaces, so that formatting is not broken when code is moved.

 

2. The line

	by AcademicYear StudentUID;

in your last step serves no purpose and slows down the process.

 

3. Can't this be done in one data step?

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 11 replies
  • 2888 views
  • 0 likes
  • 5 in conversation