Hello SAS community,
I'm trying to get label and format values (e.g. 0=No, 1=Yes) from a dataset. Using dictionary tables I've been able to get the labels but the format values are proving more elusive.
Below is my test dataset and the sql code I've used to get the labels.
I could use some help with the format values.
proc format;
value agecat
0-< 10='0 to <10'
10-< 20='10 to <20'
20-< 30='20 to <30'
30-< 40='30 to <40'
40-high='40 and above';
value sexf
1='Male'
2='Female';
value racef
1='White'
2='Black'
3='Asian'
4='Native Hawaiian or Pacific Islander'
5='American Indian or Alaskan Native'
6='Asian Indian or Pacific Islander'
7=Prefer not to answer
8='Other'
9='>1 race';
value maritalf
1='Single'
2='Married'
3='Widowed'
4='Divorced';
value educf
1='High School or Less'
2='2 Yr College'
3='4 Yr College'
4='Graduate Degree';
run;
data have;
input id age gender race marital educ;
format age agecat. gender sexf. race racef. marital maritalf. educ educf.;
label
id='Subject id'
age='Age category'
gender='Reported gender'
race='Reported race'
marital='Marital status'
educ='Highest educational attainment';
cards;
1 9 1 1 1 1
2 25 2 2 2 2
3 35 1 3 2 4
4 27 1 1 1 1
5 18 2 1 3 1
6 41 1 2 4 3
;
run;
desired output
Is all this output possible from dictionary tables (adding format just gives agecat., etc.) or maybe proc contents - I couldn't find any resources that would get both the numeric value and it's meaning (e.g. 1=Male) in the same column
Variable |
Label |
Format Values |
id |
Subject id |
|
age |
Age category |
0-< 10=0 to <10 10-< 20=10 to <20 20-< 30=20 to <30 30-< 40=30 to <40 40-high=40 and above |
gender |
Reported gender |
1=Male 2=Female |
race |
Reported race |
1=White 2=Black 3=Asian 4=Native Hawaiian or Pacific Islander 5=American Indian or Alaskan Native 6=Asian Indian or Pacific Islander 7=Prefer not to answer 8=Other 9=>1 race |
marital |
Marital status |
1=Single 2=Married 3=Widowed 4=Divorced |
edu |
Highest educational attainment |
1=High School or Less 2=2 Yr College 3=4 Yr College 4=Graduate Degree |
Thanks in advance,
Margaret
No need to hard code the values! They are already in the dataset!
For these types of simple formats you could use something like this to populate your FORMAT_VALUE variable.
data want ;
set formats(keep=fmtname start end label);
length format_value $50 ;
if start=end then format_value = catx('=',start,quote(trim(label)));
else format_value=catx('=',catx('-',start,end),quote(trim(label)));
run;
Why not just copy the text from the program that makes the format?
Do you no longer have that code?
If not you can ask PROC FORMAT to convert the format definitions into a dataset.
proc format lib=work.formats out=formats;
run;
You can then use FMTNAME, START, END and LABEL in that dataset to generate the strings you want. Since you are using the HIGH keyword in at least one of those formats you also might want to include the HLO variable.
Obs FMTNAME START END HLO LABEL 1 AGECAT 0 10 0 to <10 2 AGECAT 10 20 10 to <20 3 AGECAT 20 30 20 to <30 4 AGECAT 30 40 30 to <40 5 AGECAT 40 HIGH H 40 and above 6 EDUCF 1 1 High School or Less 7 EDUCF 2 2 2 Yr College 8 EDUCF 3 3 4 Yr College 9 EDUCF 4 4 Graduate Degree 10 MARITALF 1 1 Single 11 MARITALF 2 2 Married 12 MARITALF 3 3 Widowed 13 MARITALF 4 4 Divorced 14 RACEF 1 1 White 15 RACEF 2 2 Black 16 RACEF 3 3 Asian 17 RACEF 4 4 Native Hawaiian or Pacific Islander 18 RACEF 5 5 American Indian or Alaskan Native 19 RACEF 6 6 Asian Indian or Pacific Islander 20 RACEF 7 7 Prefer not to answer 21 RACEF 8 8 Other 22 RACEF 9 9 >1 race 23 SEXF 1 1 Male 24 SEXF 2 2 Female
Thanks Tom - very useful information.
1. I can copy in the format values but with a lot of datasets this is very time consuming and I hoped SAS would have a better solution
2. As you suggested, I converted the format definitions into a dataset
and then created a variable for the format values
data format_value; length format_value $50;
set formats(keep=FMTNAME START END LABEL);
if FMTNAME='SEXF' and start=1 and end=1 and label='Male' then format_value ='1=Male'; else
if FMTNAME='SEXF' and start=2 and end=2 and label='Female' then format_value ='1=Female';
run;
getting for my format_value
1=Male
2=Female
This really works, not to ask too much but is there a more efficient way to do this?
Really appreciate your fast and spot-on response - this problem has stumped me for a long time!
Margaret
No need to hard code the values! They are already in the dataset!
For these types of simple formats you could use something like this to populate your FORMAT_VALUE variable.
data want ;
set formats(keep=fmtname start end label);
length format_value $50 ;
if start=end then format_value = catx('=',start,quote(trim(label)));
else format_value=catx('=',catx('-',start,end),quote(trim(label)));
run;
It looks like you're planning to build a data dictionary / codebook? I would try searching lexjansen.com for papers on that subject.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.