Hello,
I have a character variable that reflect education level. I would like to convert this variable to numeric and then tell SAS I want to highest level of education per person. As of now, all levels of education are showing up per person and are character i.e
ID Education
0000 High School
Associates
Bachelors
0001 High School
Some College ...
What is the best way to go about this? Is there a way to do this via formatting?
Thanks!
First find the maximum per person. Then summarize.
proc summary data=have nway ;
class person;
var ednum;
output out=want max=highest_ednum;
run;
proc freq data=want;
tables highest_ednum;
run;
Best way is to create a custom informat. Make a comprehensive list of values in "Education", bring them into order, and assign numeric values in your proc informat code.
Alternatively, you can create a dataset that can be used as CNTLIN dataset in proc format. Use TYPE "I" for this.
Yes, use an informat and a format. Example:
proc format;
invalue edLevelIn (JUST UPCASE)
"HIGH SCHOOL" = 1
"ASSOCIATES" = 2
"BACHELORS" = 3
other = 0;
value edLevel
1 = "High School"
2 = "Associates"
3 = "Bachelors"
other = "Unknown";
run;
data test;
input ID Education & :edLevelIn.;
format Education edLevel. id z4.0;
datalines;
0000 High School
0000 Associates
0000 Bachelors
0001 High School
0001 Some College
;
proc print data=test noobs; run;
You have a couple of issues. First the character to numeric.
I suspect we did not see all of the character values so this is incomplete but code similar to:
data want; set have; select (education); when ('High School') EdNum=1; when ('Some College') EdNum=2; when ('Associates') EdNum=3; when ('Bachelors') EdNum=4; otherwise put "WARNING: Unexpected value of education: " education; end; run;
Select is an option to writing a whole bunch of If/Then/Else assignments. The comparison value must be exactly the same, case and all, to the values of your variable. The numeric value to assign is up to you.
You can sort the data based on ID and the numeric value to get things in increasing (sort of) education level.
The next part depends on what your actual data looks like and are you looking for a report or a data set or what. Reports are what people read, data sets are for further manipulation. Also, if you any other other variables involved what needs to be done with them will affect the choices at this point.
So if there are other variables involved you need to provide examples and what the final result should be.
If the only thing involved is the ID and education level you can use any of the tools that involve the MAX of the numeric variable:
Such as:
Proc means data=want max; class id; var ednum; run;
but an example of what you expect the output to look like is a good idea.
For now I am just trying to create a data set where I can pull the highest level of education for each individual. This was helpful and quick!
Thanks!
Now that I have ednum the way that I want it, I would like the n and percentage of each level but only using the highest level of education per person. Is there a way to do this in tabulate or freq? I have attached an example of what I would like to do, only this table includes all education levels. Hope this makes sense!
First find the maximum per person. Then summarize.
proc summary data=have nway ;
class person;
var ednum;
output out=want max=highest_ednum;
run;
proc freq data=want;
tables highest_ednum;
run;
This is exactly what I needed! Thank you!
@ballardw's data step in the beginning of this post helped me to convert and order my data and also is a solution to this post.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.