I keep staring at this but I just don't see it.
I have a data file that has grade point avaerages (GPA's). Most are on a scale of 0 to 4. Some are on a scale of 0 to 5.
I need to convert the ones on the 5 scale to the an approximate 4 scale measue. Don't sweat the conversion choice in this- I know it isn't perfect but it will do.
I read in the data. Filename changed to protect the guilty. Then:
I get a table that has the old gpa and not the new_gpa.
GPA's Overall | 16:47 Thursday, November 6, 2014 192 |
The FREQ Procedure |
Cumulative | Cumulative | ||||
new_gpa | Frequency | Percent | Frequency | Percent | |
0-1.00 | 79 | 0.48 | 79 | 0.48 | |
1.01-2.00 | 343 | 2.06 | 422 | 2.54 | |
2.01-2.50 | 1283 | 7.72 | 1705 | 10.26 | |
2.51-3.00 | 3720 | 22.39 | 5425 | 32.65 | |
3.01-3.50 | 5161 | 31.06 | 10586 | 63.71 | |
3.51-4.00 | 5889 | 35.44 | 16475 | 99.15 | |
4.01-4.50 | 111 | 0.67 | 16586 | 99.82 | |
4.51-5.00 | 26 | 0.16 | 16612 | 99.98 | |
> 5.00 | 4 | 0.02 | 16616 | 100.00 |
Frequency Missing = 1555 |
Frustrating and I don't see what I am doing wrong.
Any help appreciated.
You must be doing something in your
part which "re-sets" the values. If you're using SAS EG I'd suggest you run your program step by step and check what you're having in table "survey_results_recode_gpa".
The code snippets you've posted should do what you expect them to do.
I just made a new program where the above code snippets are the only actions in the program. Ran it fresh, on a reboot. Same result.
I would do a proc freq on gpr_scale and see what you really have. You have 1555 missing values for new_gpa.
I have missing values in gpa so I have missing values in new_gpa. Some people just didn't answer the question. But i will look at this some more and see if it yeilds any clues.
Also, I find looking at before and after together is sometimes helpful with recode debugging.
proc freq data = survey_results_recode_gpa;
tables gpa* new_gpa/ missing norow nocol nopercent;
title "GPA's Overall";
run;title;
I ran that but I don't know what I am looking at. Here is a partial of the result.
Given that you have some values that were assigned missing values, I'd add a 3rd variable to that proc freq:
proc freq data = survey_results_recode_gpa;
tables gpa_scale*gpa* new_gpa/ missing norow nocol nopercent;
title "GPA's Overall";
run;title;
It says that you have a large number of GPA that are NOT getting recoded to the NEW_GPA value and that they come from all ranges of GPA. This is a very strong indicator that you have a conditional that isn't working correctly but it appears that when the reassignment is done it is being applied correctly. Since most of the 1.01-2.00 and 2.01-2.50 ranges are recoding 2 values to 1 then when the gpa_scale =5 seems to be working.
But with the ranges with values greater than or including 4 none are assigned which means a couple of things may be likely:
1) gpa_scale isn't assigned for them or is a value other than 4 or 5 for some records, Arthur's code will show if this is the case
2) since you are comparing strings that the value in your IF gpa='' then the values aren't evaluating to match what you entered in the ''
FYI, you have the If / then repeated for 2.01 and 2.51 ranges.
If you are likely to do much of this type of recoding I suggest also learning the SELECT construct
if gpa_scale=5 then select (gpa);
when('0-1.00' ) new_gpa = '0-1.00';
when('1.01-2.00','2.01-2.50') new_gpa = '1.01-2.00';
when('2.51-3.00') new_gpa = '2.01-2.50';
when('3.01-3.50') new_gpa = '2.51-3.00';
when('3.51-4.00','4.01-4.50') new_gpa = '3.01-3.50';
when('4.51-5.00') new_gpa = '3.51-4.00';
when('> 5.00') new_gpa = '3.51-4.00';
otherwise new_gpa='Invalid' ;
end;
Alright.
I have 18171 records.
16312 with a gpa and a new_gpa and a 4 on gpa_scale.
304 with gpa and new_gpa and on a 5 scale.
That is a 16616 subtotal.
41 with no gpa or new_gpa on a 4 scale.
1 with no gpa or new gpa on a 5 scale.
That is a 42 subtotal.
1513 with no gpa, no new_gpa, and no scale.
16616 + 42 1513 = 18171.
No new_gpa's exceed 4.
Changed my if do-end to address the misreported GPAs.
Ran this:
proc freq data = survey_results_recode_gpa;
tables new_gpa;
where gpa_scale in (4, 5);
title "GPA's Overall";
run;
and voila! Think this may move me where I want to go.
Can't someone have a 5.0 on a 4.0 scale?
No. You can have a 5 on 5 scale, by the maximum on a 4 scale is a 4.
Edit: You are onto something. Looking at the data I have misreported 4.5 GPA's on a 4 scale. Going to change the conditional to pick those up.
Edit2: Also have 276 records with GPA's and no scale indicated. Damn dirty data.
What is the original variable? When that one is having a format defined on the numbers varying in that range there should be another numeric format applied.
Recoding on displayed values will not work as those are not the internal values.
Okay.
I am marking dbailey and ballardw as helpful.
You guys put me onto the answer.
Thanks to all responders.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.