Solved: Correct Entries

twildone · Posted 07-25-2020 01:49 PM

Hi...I am trying to change certain Department entries. Each record represents a course and courses may be taken from different departments. In order to identify the program that the student is enrolled in, I need to group the courses for each student to the appropiate Department which will identify the program of study. Because some courses are not counted or used as credit toward the students program, I have to identify which Department entries that are to be changed. From the code below that I have so far, the changes are being made. Any suggestion or revision would be greatly appreciated.Thanks.

data Have;
    length 'Academic Year'n $ 9 StudentUID 8 Department $ 4;
    format 'Academic Year'n $char9. StudentUID best12. Department $char4.;
    informat 'Academic Year'n $char9. StudentUID best12. Department $char4.;
    infile datalines4 dlm='7f'x missover dsd;
    input 'Academic Year'n : $char9. StudentUID : best32. Department : $char4.;
datalines4;
2017-2018 83136 CAR
2017-2018 83136 CAR
2017-2018 83136 CT
2017-2018 83136 YB
2017-2018 83136 YB
2017-2018 83136 YBC
2017-2018 83136 CAR
2017-2018 88069 NCT
2017-2018 88069 NCT
2017-2018 88069 ES
2017-2018 88069 CDCA
2018-2019 83882 MOA
2018-2019 83882 MOA
2018-2019 83882 RH
2018-2019 83882 RH
2018-2019 83882 RH
2018-2019 83882 MOA
2018-2019 83882 MOA
2018-2019 83882 MOA
2018-2019 83882 ES
2018-2019 83882 ES
2019-2020 86518 RH
2019-2020 86518 RH
2019-2020 86518 MAA
2019-2020 86518 MAA
2019-2020 86518 MAA
2019-2020 86518 ES
2019-2020 86518 ES
2019-2020 86519 IB
2019-2020 86519 IB
2019-2020 86519 IB
2019-2020 86519 LM
2019-2020 86519 LM
2019-2020 86519 IB
;;;;

proc sort data=Have;
	by 'Academic Year'n StudentUID;
run;

data Want;
	set Have;
	by 'Academic Year'n StudentUID;
   	length Department1 $ 4.;
	retain Dpt;
		if findw(Department,'YBC')>0 then do;
			Dpt = 'YBC';
			if Department in ('YBC','YB','CT','CAR') then 
					Department1 = DPT;
			else 
				Department1 = Department;
		drop Dpt;
		end;
		if findw(Department,'CDCA')>0 then do;
			Dpt = 'CDCA';
			if Department in ('CDCA','NCT') then 
					Department1 = DPT;
			else 
				Department1 = Department;
		drop Dpt;
		end;
		if findw(Department,'MOA')>0 then do;
			Dpt = 'MOA';
			if Department in ('MOA','RH') then 
					Department1 = DPT;
			else 
				Department1 = Department;
		drop Dpt;
		end;
		if findw(Department,'MAA')>0 then do;
			Dpt = 'MAA';
			if Department in ('MAA','RH') then 
					Department1 = DPT;
			else 
				Department1 = Department;
		drop Dpt;
		end;
		if findw(Department,'IB')>0 then do;
			Dpt = 'IB';
			if Department in ('IB','LM') then 
					Department1 = DPT;
			else 
				Department1 = Department;
		drop Dpt;
		end;
run;

Want:
Academic Year	StudentUID	Department	Department1
2017-2018	83136	CAR	YBC
2017-2018	83136	CAR	YBC
2017-2018	83136	CT	YBC
2017-2018	83136	YB	YBC
2017-2018	83136	YB	YBC
2017-2018	83136	YBC	YBC
2017-2018	83136	CAR	YBC
2017-2018	88069	NCT	CDCA
2017-2018	88069	NCT	CDCA
2017-2018	88069	ES	ES
2017-2018	88069	CDCA	CDCA
2018-2019	83882	MOA	MOA
2018-2019	83882	MOA	MOA
2018-2019	83882	RH	MOA
2018-2019	83882	RH	MOA
2018-2019	83882	RH	MOA
2018-2019	83882	MOA	MOA
2018-2019	83882	MOA	MOA
2018-2019	83882	MOA	MOA
2018-2019	83882	ES	ES
2018-2019	83882	ES	ES
2019-2020	86518	RH	MAA
2019-2020	86518	RH	MAA
2019-2020	86518	MAA	MAA
2019-2020	86518	MAA	MAA
2019-2020	86518	MAA	MAA
2019-2020	86518	ES	ES
2019-2020	86518	ES	ES
2019-2020	86519	IB	IB
2019-2020	86519	IB	IB
2019-2020	86519	IB	IB
2019-2020	86519	LM	IB
2019-2020	86519	LM	IB
2019-2020	86519	IB	IB

ChrisNZ · Posted 07-26-2020 07:11 PM

1. The weird formatting of your code is because you use tabs.

I am a firm proponent of using spaces, so that formatting is not broken when code is moved.

2. The line

	by AcademicYear StudentUID;

in your last step serves no purpose and slows down the process.

3. Can't this be done in one data step?

High-Performance SAS Coding - Third Edition

View solution in original post

qatman28 · Posted 07-25-2020 09:19 PM

It looks like you are just doing a straight recode of department for certain values. If so then this simple code should work.

data want;
   set have;
   length department1 $4;

   select (department);
      when ('YBC','YB','CT','CAR') department1 = 'YBC';
      when ('CDCA','NCT') department1 = 'CDCA';
      when ('MAA','RH') department1 = 'MAA';
      when ('IB','LM') department1 = 'IB';
      otherwise department1 = department;
   end;

run;

Patrick · Posted 07-25-2020 10:43 PM

Is this just about re-coding? If so then a format could be quite useful.

proc format;
  value $dpt
    'YBC','YB','CT','CAR' = 'YBC'
    'CDCA','NCT'          = 'CDCA'
    'MAA','RH'            = 'MAA'
    'IB','LM'             = 'IB'
  ;
run;

data want;
  set have;
  Department1=put(Department, $dpt.);
run;

twildone · Posted 07-26-2020 11:45 AM

Hi...This is not straight forward of just re-coding. Some courses recieve credit toward other programs. For example, the course with a Department of RH receives credit toward MAA, MOA as well as RH programs. So as to know which program that the student is receiving credit for the course, I need to first find out which program that the student is enrolled in for that academic year which I am attempting to do with the statement with the findw. Then I know which department codes need to recoded.

qatman28 · Posted 07-26-2020 12:00 PM

The posted code produces a variable "department1" per your wanted output. If you have a separate question then please ask, otherwise it appears that the code from Patrick or myself works.

twildone · Posted 07-26-2020 12:35 PM

The posted Department1 for StudentUID=86518 is MAA. When I ran your code, I am getting both MAA and MOA for this same StudentUID.

Tom · Posted 07-26-2020 02:45 PM

@twildone wrote:
The posted Department1 for StudentUID=86518 is MAA. When I ran your code, I am getting both MAA and MOA for this same StudentUID.

What is the meaning of the new variable?

Your words make it sound like you want the student's department to drive the remapping. But your input does not have a student level department variable. The existing data just have the one department variable the observation level. Most of your students have more than one department listed. How do you want to pick the department for the remapping?

      Student   Academic
Obs     UID       Year      dept1   dept2   dept3   dept4   dept5   dept6   dept7   dept8   dept9   dept10

 1     83136    2017-2018    CAR     CAR     CT     YB       YB      YBC     CAR
 2     83882    2018-2019    MOA     MOA     RH     RH       RH      MOA     MOA     MOA     ES       ES
 3     86518    2019-2020    RH      RH      MAA    MAA      MAA     ES      ES
 4     86519    2019-2020    IB      IB      IB     LM       LM      IB
 5     88069    2017-2018    NCT     NCT     ES     CDCA

ChrisNZ · Posted 07-26-2020 01:30 AM

You only one drop statement anywhere in the data step.

High-Performance SAS Coding - Third Edition

Tom · Posted 07-26-2020 02:59 PM

Your posted data step is going to create problems. Does your real data have these problems?

1206  data posted;
1207      length 'AcademicYear'n $ 9 StudentUID 8 Department $ 4;
1208      format 'AcademicYear'n $char9. StudentUID best12. Department $char4.;
1209      informat 'AcademicYear'n $char9. StudentUID best12. Department $char4.;
1210      infile datalines4 dlm='7f'x missover dsd;
1211      input 'AcademicYear'n : $char9. StudentUID : best32. Department : $char4.;
1212  datalines4;

NOTE: Invalid data for StudentUID in line 1234 11-19.
RULE:       ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9----+

1234CHAR    2019-2020. 86518 RH
    ZONE    33332333372333332542222222222222222222222222222222222222222222222222222222222222
    NUMR    2019D2020F0865180280000000000000000000000000000000000000000000000000000000000000
'AcademicYear'n=2019-2020 StudentUID=. Department=  _ERROR_=1 _N_=22
NOTE: Invalid data for StudentUID in line 1237 11-20.

1237CHAR    2019-2020. 86518 MAA
    ZONE    33332333372333332444222222222222222222222222222222222222222222222222222222222222
    NUMR    2019D2020F0865180D11000000000000000000000000000000000000000000000000000000000000
'AcademicYear'n=2019-2020 StudentUID=. Department=  _ERROR_=1 _N_=25
NOTE: The data set WORK.POSTED has 34 observations and 3 variables.

Plus you will create 19 observations where DEPARTMENT starts with a space because you are using $CHAR informat instead of just reading the string normally.

Try using a more normal data step since non of your variables require any special informats or formats and you don't have any missing values to worry about.

data have;
    length AcademicYear $ 9 StudentUID 8 Department $ 4;
    input AcademicYear StudentUID Department;
datalines4;
2017-2018  83136  CAR
2017-2018  83136  CAR
2017-2018  83136  CT
2017-2018  83136  YB
2017-2018  83136  YB
2017-2018  83136  YBC
2017-2018  83136  CAR
2017-2018  88069  NCT
2017-2018  88069  NCT
2017-2018  88069  ES
2017-2018  88069  CDCA
2018-2019  83882  MOA
2018-2019  83882  MOA
2018-2019  83882  RH
2018-2019  83882  RH
2018-2019  83882  RH
2018-2019  83882  MOA
2018-2019  83882  MOA
2018-2019  83882  MOA
2018-2019  83882  ES
2018-2019  83882  ES
2019-2020  86518 RH
2019-2020  86518  RH
2019-2020  86518  MAA
2019-2020  86518 MAA
2019-2020  86518  MAA
2019-2020  86518  ES
2019-2020  86518  ES
2019-2020  86519  IB
2019-2020  86519  IB
2019-2020  86519  IB
2019-2020  86519  LM
2019-2020  86519  LM
2019-2020  86519  IB
;;;;

twildone · Posted 07-26-2020 03:55 PM

Hi Tom....thanks for your suggestion. I made the changes and have been working on this by trying different things. The code below seems to be getting closer to what I need. I used the max_Dpt approach so I can test to make sure only the student records that need to be changed are changed. By changing the YBC to YB, for example, for StudentUID 83136, then none of the Department codes should change and that is what is happening. The issue that I am still having problems with is dealing with courses that receive credit in more than one program (ie. RH can receive credit in MOA, MAA as well as RH). I thought by using the "by AcademicYear StudentUID" statement that it would take care of that problem as each student would be evaluated separately.

data have;
    length AcademicYear $ 9 StudentUID 8 Department $ 5;
    input AcademicYear StudentUID Department;
datalines4;
2017-2018  83136  CAR
2017-2018  83136  CAR
2017-2018  83136  CT
2017-2018  83136  YB
2017-2018  83136  YB
2017-2018  83136  YBC
2017-2018  83136  CAR
2017-2018  88069  NCT
2017-2018  88069  NCT
2017-2018  88069  ES
2017-2018  88069  CDCA
2018-2019  83882  MOA
2018-2019  83882  MOA
2018-2019  83882  RH
2018-2019  83882  RH
2018-2019  83882  RH
2018-2019  83882  MOA
2018-2019  83882  MOA
2018-2019  83882  MOA
2018-2019  83882  ES
2018-2019  83882  ES
2019-2020  86518 RH
2019-2020  86518  RH
2019-2020  86518  MAA
2019-2020  86518 MAA
2019-2020  86518  MAA
2019-2020  86518  ES
2019-2020  86518  ES
2019-2020  86519  IB
2019-2020  86519  IB
2019-2020  86519  IB
2019-2020  86519  LM
2019-2020  86519  LM
2019-2020  86519  IB
;;;;

proc sort data=Have;
	by AcademicYear StudentUID;
run;

data Have;
	set Have;
		if Department in ('YBC','CDCA','MOA','MAA','IB') then 
			Dpt = 1;
run;

data Have(drop=Dpt);  
   max_Dpt=.;
   do until (last.StudentUID);
      set Have;
      by AcademicYear StudentUID;
      if Dpt gt max_Dpt then max_Dpt=Dpt;
   end;
   do until (last.StudentUID);
      set Have;
      by AcademicYear StudentUID;
      output;
   end;
   rename max_Dpt=Dpt1;
run;

data Want(drop=Dpt1);
	set Have;
	by AcademicYear StudentUID;
   	length Department1 $ 5.;
			if Dpt1=1 and Department in ('YBC','YB','CT','CAR') then 
					Department1 = 'YBC';
			else 
			if Dpt1=1 and Department in ('CDCA','NCT') then 
					Department1 = 'CDCA';
			else 
			if Dpt1=1 and Department in ('MOA','RH') then 
					Department1 = 'MOA';
			else 
			if Dpt1=1 and Department in ('MAA','RH') then 
					Department1 = 'MAA';
			else 
			if Dpt1=1 and Department in ('IB','LM') then 
					Department1 = 'IB';
			else 
				Department1 = Department;
run;

twildone · Posted 07-26-2020 04:42 PM

Hi Tom.....I think I might have got this to work. Thanks

data Have;
	set Have;
		if Department in ('YBC') then
			Dpt = 1;
		if Department in ('CDCA') then
			Dpt = 2;
		if Department in ('MOA') then
			Dpt = 3;
		if Department in ('MAA') then
			Dpt = 4;
		if Department in ('IB') then
			Dpt = 5;
run;

data Have(drop=Dpt);  
   max_Dpt=.;
   do until (last.StudentUID);
      set Have;
      by AcademicYear StudentUID;
      if Dpt gt max_Dpt then max_Dpt=Dpt;
   end;
   do until (last.StudentUID);
      set Have;
      by AcademicYear StudentUID;
      output;
   end;
   rename max_Dpt=Dpt1;
run; 

data Want(drop=Dpt1);
	set Have;
	by AcademicYear StudentUID;
   	length Department1 $ 5.;
			if Dpt1=1 and Department in ('YBC','YB','CT','CAR') then 
					Department1 = 'YBC';
			else 
			if Dpt1=2 and Department in ('CDCA','NCT') then 
					Department1 = 'CDCA';
			else 
			if Dpt1=3 and Department in ('MOA','RH') then 
					Department1 = 'MOA';
			else 
			if Dpt1=4 and Department in ('MAA','RH') then 
					Department1 = 'MAA';
			else 
			if Dpt1=5 and Department in ('IB','LM') then 
					Department1 = 'IB';
			else 
				Department1 = Department;
run;

ChrisNZ · Posted 07-26-2020 07:11 PM

1. The weird formatting of your code is because you use tabs.

I am a firm proponent of using spaces, so that formatting is not broken when code is moved.

2. The line

	by AcademicYear StudentUID;

in your last step serves no purpose and slows down the process.

3. Can't this be done in one data step?

High-Performance SAS Coding - Third Edition

Correct Entries

Re: Correct Entries

Re: Correct Entries

Re: Correct Entries

Re: Correct Entries

Re: Correct Entries

Re: Correct Entries

Re: Correct Entries

Re: Correct Entries

Re: Correct Entries

Re: Correct Entries

Re: Correct Entries

Re: Correct Entries

Catch up on SAS Innovate 2026

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away