Here's a good write up on PROC FORMAT and creating format. It looks like you're creating an informat, where you're trying to convert a character variable to a numeric variable?
How consistent are these phrases? Are there a small number of them associated with this field and do they stay the same (assuming you will read more than one file with the same field)?
Here is an example where I have to deal with inconsistent text to create and do something similar to what you are doing:
proc format library=work; Invalue AllRetention .,'',' ' = 1 "Changing Schedule" = 4 "Child no longer in family custody" = 3 "Child no longer in family, custody (parental rights terminated)" =3 "Child no longer with caregiver" =3 "Child no longer with family/caregiver"=3 "Child reached 2nd birthday" = 2 "Client incarcerated" = 3 "Caregiver incarcerated" = 3 "Client received what she needs from the program"= 2 "Family participation goals met" = 2 "Client returned to work or school" = 4 "Completed program service cycle" = 2 "Dissatisfied with program" =3 "Drop-out" =3 "Excessive missed appointments/attempted visits" =3 "Family no longer interested in program" = 3 "Index child reached maximum age"=2 "Maternal death" =3 "Miscarried/fetal death/infant death","Miscarried/fetal death" =4 "Missed home visits (excessive)"= 3 "Move" =3 "Moved out of service area" = 3 "Other" =4 "Other or Unknown" = 4 "Pressure from family" =3 "Program Completed" =2 "Program unable to provide service to client" =4 "Refused new nurse" = 3 "Returned to work or school" =4 "Refused participation" = 3 "Transition to another program" = 4 "Unable to contact" = 3 "Unable to contact or locate" =3 "Unable to locate" =3 Other=_error_ ; Value Retention ., 1 = 'Currently receiving services' 2 = 'Completed program' 3 = 'Stopped services before completion' 4 = 'Other' ; run;
Note that on the INVALUE part is have the Other=_error_
That means that when I use this informat and encounter a value I wasn't expecting I get an error message. The log will tell me that there is invalid data and I can the text and assign an appropriate value. Then rerun the code.
The values must be exactly as used in the INVALUE definition. The Upcase option may help if you get values like 'Female' 'female' 'FEMALE' in your data to compare all the values to the upcase version.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 16. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.