Hi...I am trying to change certain Department entries. Each record represents a course and courses may be taken from different departments. In order to identify the program that the student is enrolled in, I need to group the courses for each student to the appropiate Department which will identify the program of study. Because some courses are not counted or used as credit toward the students program, I have to identify which Department entries that are to be changed. From the code below that I have so far, the changes are being made. Any suggestion or revision would be greatly appreciated.Thanks.
data Have; length 'Academic Year'n $ 9 StudentUID 8 Department $ 4; format 'Academic Year'n $char9. StudentUID best12. Department $char4.; informat 'Academic Year'n $char9. StudentUID best12. Department $char4.; infile datalines4 dlm='7f'x missover dsd; input 'Academic Year'n : $char9. StudentUID : best32. Department : $char4.; datalines4; 2017-2018 83136 CAR 2017-2018 83136 CAR 2017-2018 83136 CT 2017-2018 83136 YB 2017-2018 83136 YB 2017-2018 83136 YBC 2017-2018 83136 CAR 2017-2018 88069 NCT 2017-2018 88069 NCT 2017-2018 88069 ES 2017-2018 88069 CDCA 2018-2019 83882 MOA 2018-2019 83882 MOA 2018-2019 83882 RH 2018-2019 83882 RH 2018-2019 83882 RH 2018-2019 83882 MOA 2018-2019 83882 MOA 2018-2019 83882 MOA 2018-2019 83882 ES 2018-2019 83882 ES 2019-2020 86518 RH 2019-2020 86518 RH 2019-2020 86518 MAA 2019-2020 86518 MAA 2019-2020 86518 MAA 2019-2020 86518 ES 2019-2020 86518 ES 2019-2020 86519 IB 2019-2020 86519 IB 2019-2020 86519 IB 2019-2020 86519 LM 2019-2020 86519 LM 2019-2020 86519 IB ;;;; proc sort data=Have; by 'Academic Year'n StudentUID; run; data Want; set Have; by 'Academic Year'n StudentUID; length Department1 $ 4.; retain Dpt; if findw(Department,'YBC')>0 then do; Dpt = 'YBC'; if Department in ('YBC','YB','CT','CAR') then Department1 = DPT; else Department1 = Department; drop Dpt; end; if findw(Department,'CDCA')>0 then do; Dpt = 'CDCA'; if Department in ('CDCA','NCT') then Department1 = DPT; else Department1 = Department; drop Dpt; end; if findw(Department,'MOA')>0 then do; Dpt = 'MOA'; if Department in ('MOA','RH') then Department1 = DPT; else Department1 = Department; drop Dpt; end; if findw(Department,'MAA')>0 then do; Dpt = 'MAA'; if Department in ('MAA','RH') then Department1 = DPT; else Department1 = Department; drop Dpt; end; if findw(Department,'IB')>0 then do; Dpt = 'IB'; if Department in ('IB','LM') then Department1 = DPT; else Department1 = Department; drop Dpt; end; run; Want: Academic Year StudentUID Department Department1 2017-2018 83136 CAR YBC 2017-2018 83136 CAR YBC 2017-2018 83136 CT YBC 2017-2018 83136 YB YBC 2017-2018 83136 YB YBC 2017-2018 83136 YBC YBC 2017-2018 83136 CAR YBC 2017-2018 88069 NCT CDCA 2017-2018 88069 NCT CDCA 2017-2018 88069 ES ES 2017-2018 88069 CDCA CDCA 2018-2019 83882 MOA MOA 2018-2019 83882 MOA MOA 2018-2019 83882 RH MOA 2018-2019 83882 RH MOA 2018-2019 83882 RH MOA 2018-2019 83882 MOA MOA 2018-2019 83882 MOA MOA 2018-2019 83882 MOA MOA 2018-2019 83882 ES ES 2018-2019 83882 ES ES 2019-2020 86518 RH MAA 2019-2020 86518 RH MAA 2019-2020 86518 MAA MAA 2019-2020 86518 MAA MAA 2019-2020 86518 MAA MAA 2019-2020 86518 ES ES 2019-2020 86518 ES ES 2019-2020 86519 IB IB 2019-2020 86519 IB IB 2019-2020 86519 IB IB 2019-2020 86519 LM IB 2019-2020 86519 LM IB 2019-2020 86519 IB IB
1. The weird formatting of your code is because you use tabs.
I am a firm proponent of using spaces, so that formatting is not broken when code is moved.
2. The line
by AcademicYear StudentUID;
in your last step serves no purpose and slows down the process.
3. Can't this be done in one data step?
It looks like you are just doing a straight recode of department for certain values. If so then this simple code should work.
data want;
set have;
length department1 $4;
select (department);
when ('YBC','YB','CT','CAR') department1 = 'YBC';
when ('CDCA','NCT') department1 = 'CDCA';
when ('MAA','RH') department1 = 'MAA';
when ('IB','LM') department1 = 'IB';
otherwise department1 = department;
end;
run;
Is this just about re-coding? If so then a format could be quite useful.
proc format;
value $dpt
'YBC','YB','CT','CAR' = 'YBC'
'CDCA','NCT' = 'CDCA'
'MAA','RH' = 'MAA'
'IB','LM' = 'IB'
;
run;
data want;
set have;
Department1=put(Department, $dpt.);
run;
Hi...This is not straight forward of just re-coding. Some courses recieve credit toward other programs. For example, the course with a Department of RH receives credit toward MAA, MOA as well as RH programs. So as to know which program that the student is receiving credit for the course, I need to first find out which program that the student is enrolled in for that academic year which I am attempting to do with the statement with the findw. Then I know which department codes need to recoded.
@twildone wrote:
The posted Department1 for StudentUID=86518 is MAA. When I ran your code, I am getting both MAA and MOA for this same StudentUID.
What is the meaning of the new variable?
Your words make it sound like you want the student's department to drive the remapping. But your input does not have a student level department variable. The existing data just have the one department variable the observation level. Most of your students have more than one department listed. How do you want to pick the department for the remapping?
Student Academic Obs UID Year dept1 dept2 dept3 dept4 dept5 dept6 dept7 dept8 dept9 dept10 1 83136 2017-2018 CAR CAR CT YB YB YBC CAR 2 83882 2018-2019 MOA MOA RH RH RH MOA MOA MOA ES ES 3 86518 2019-2020 RH RH MAA MAA MAA ES ES 4 86519 2019-2020 IB IB IB LM LM IB 5 88069 2017-2018 NCT NCT ES CDCA
You only one drop statement anywhere in the data step.
Your posted data step is going to create problems. Does your real data have these problems?
1206 data posted; 1207 length 'AcademicYear'n $ 9 StudentUID 8 Department $ 4; 1208 format 'AcademicYear'n $char9. StudentUID best12. Department $char4.; 1209 informat 'AcademicYear'n $char9. StudentUID best12. Department $char4.; 1210 infile datalines4 dlm='7f'x missover dsd; 1211 input 'AcademicYear'n : $char9. StudentUID : best32. Department : $char4.; 1212 datalines4; NOTE: Invalid data for StudentUID in line 1234 11-19. RULE: ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9----+ 1234CHAR 2019-2020. 86518 RH ZONE 33332333372333332542222222222222222222222222222222222222222222222222222222222222 NUMR 2019D2020F0865180280000000000000000000000000000000000000000000000000000000000000 'AcademicYear'n=2019-2020 StudentUID=. Department= _ERROR_=1 _N_=22 NOTE: Invalid data for StudentUID in line 1237 11-20. 1237CHAR 2019-2020. 86518 MAA ZONE 33332333372333332444222222222222222222222222222222222222222222222222222222222222 NUMR 2019D2020F0865180D11000000000000000000000000000000000000000000000000000000000000 'AcademicYear'n=2019-2020 StudentUID=. Department= _ERROR_=1 _N_=25 NOTE: The data set WORK.POSTED has 34 observations and 3 variables.
Plus you will create 19 observations where DEPARTMENT starts with a space because you are using $CHAR informat instead of just reading the string normally.
Try using a more normal data step since non of your variables require any special informats or formats and you don't have any missing values to worry about.
data have;
length AcademicYear $ 9 StudentUID 8 Department $ 4;
input AcademicYear StudentUID Department;
datalines4;
2017-2018 83136 CAR
2017-2018 83136 CAR
2017-2018 83136 CT
2017-2018 83136 YB
2017-2018 83136 YB
2017-2018 83136 YBC
2017-2018 83136 CAR
2017-2018 88069 NCT
2017-2018 88069 NCT
2017-2018 88069 ES
2017-2018 88069 CDCA
2018-2019 83882 MOA
2018-2019 83882 MOA
2018-2019 83882 RH
2018-2019 83882 RH
2018-2019 83882 RH
2018-2019 83882 MOA
2018-2019 83882 MOA
2018-2019 83882 MOA
2018-2019 83882 ES
2018-2019 83882 ES
2019-2020 86518 RH
2019-2020 86518 RH
2019-2020 86518 MAA
2019-2020 86518 MAA
2019-2020 86518 MAA
2019-2020 86518 ES
2019-2020 86518 ES
2019-2020 86519 IB
2019-2020 86519 IB
2019-2020 86519 IB
2019-2020 86519 LM
2019-2020 86519 LM
2019-2020 86519 IB
;;;;
Hi Tom....thanks for your suggestion. I made the changes and have been working on this by trying different things. The code below seems to be getting closer to what I need. I used the max_Dpt approach so I can test to make sure only the student records that need to be changed are changed. By changing the YBC to YB, for example, for StudentUID 83136, then none of the Department codes should change and that is what is happening. The issue that I am still having problems with is dealing with courses that receive credit in more than one program (ie. RH can receive credit in MOA, MAA as well as RH). I thought by using the "by AcademicYear StudentUID" statement that it would take care of that problem as each student would be evaluated separately.
data have; length AcademicYear $ 9 StudentUID 8 Department $ 5; input AcademicYear StudentUID Department; datalines4; 2017-2018 83136 CAR 2017-2018 83136 CAR 2017-2018 83136 CT 2017-2018 83136 YB 2017-2018 83136 YB 2017-2018 83136 YBC 2017-2018 83136 CAR 2017-2018 88069 NCT 2017-2018 88069 NCT 2017-2018 88069 ES 2017-2018 88069 CDCA 2018-2019 83882 MOA 2018-2019 83882 MOA 2018-2019 83882 RH 2018-2019 83882 RH 2018-2019 83882 RH 2018-2019 83882 MOA 2018-2019 83882 MOA 2018-2019 83882 MOA 2018-2019 83882 ES 2018-2019 83882 ES 2019-2020 86518 RH 2019-2020 86518 RH 2019-2020 86518 MAA 2019-2020 86518 MAA 2019-2020 86518 MAA 2019-2020 86518 ES 2019-2020 86518 ES 2019-2020 86519 IB 2019-2020 86519 IB 2019-2020 86519 IB 2019-2020 86519 LM 2019-2020 86519 LM 2019-2020 86519 IB ;;;; proc sort data=Have; by AcademicYear StudentUID; run; data Have; set Have; if Department in ('YBC','CDCA','MOA','MAA','IB') then Dpt = 1; run; data Have(drop=Dpt); max_Dpt=.; do until (last.StudentUID); set Have; by AcademicYear StudentUID; if Dpt gt max_Dpt then max_Dpt=Dpt; end; do until (last.StudentUID); set Have; by AcademicYear StudentUID; output; end; rename max_Dpt=Dpt1; run; data Want(drop=Dpt1); set Have; by AcademicYear StudentUID; length Department1 $ 5.; if Dpt1=1 and Department in ('YBC','YB','CT','CAR') then Department1 = 'YBC'; else if Dpt1=1 and Department in ('CDCA','NCT') then Department1 = 'CDCA'; else if Dpt1=1 and Department in ('MOA','RH') then Department1 = 'MOA'; else if Dpt1=1 and Department in ('MAA','RH') then Department1 = 'MAA'; else if Dpt1=1 and Department in ('IB','LM') then Department1 = 'IB'; else Department1 = Department; run;
Hi Tom.....I think I might have got this to work. Thanks
data Have; set Have; if Department in ('YBC') then Dpt = 1; if Department in ('CDCA') then Dpt = 2; if Department in ('MOA') then Dpt = 3; if Department in ('MAA') then Dpt = 4; if Department in ('IB') then Dpt = 5; run; data Have(drop=Dpt); max_Dpt=.; do until (last.StudentUID); set Have; by AcademicYear StudentUID; if Dpt gt max_Dpt then max_Dpt=Dpt; end; do until (last.StudentUID); set Have; by AcademicYear StudentUID; output; end; rename max_Dpt=Dpt1; run; data Want(drop=Dpt1); set Have; by AcademicYear StudentUID; length Department1 $ 5.; if Dpt1=1 and Department in ('YBC','YB','CT','CAR') then Department1 = 'YBC'; else if Dpt1=2 and Department in ('CDCA','NCT') then Department1 = 'CDCA'; else if Dpt1=3 and Department in ('MOA','RH') then Department1 = 'MOA'; else if Dpt1=4 and Department in ('MAA','RH') then Department1 = 'MAA'; else if Dpt1=5 and Department in ('IB','LM') then Department1 = 'IB'; else Department1 = Department; run;
1. The weird formatting of your code is because you use tabs.
I am a firm proponent of using spaces, so that formatting is not broken when code is moved.
2. The line
by AcademicYear StudentUID;
in your last step serves no purpose and slows down the process.
3. Can't this be done in one data step?
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.