Hello,
So I used formats to create labels for my output (proc freqs). However, it appears only the formats created using characters apply to the proc freq outputs. Formats where I used numbers fail to show up in the output and simply appear as the original value.
Here is the code I used:
data MHS.identification2;
set MHS.identification;
if age < 20 then agegroup = 1;
if age >= 20 and age < 25 then agegroup = 2;
if age >= 25 and age < 30 then agegroup = 3;
if age >= 30 and age < 35 then agegroup = 4;
if age >=35 and age < 40 then agegroup = 5;
if age >=40 then agegroup = 6;
run;
data MHS.identification2;
set MHS.identification;
if income < 10000 then incomegroup =1;
if income >=10000 and income <14999 then incomegroup =2;
if income >=15000 and income <19999 then incomegroup =3;
if income >=20000 and income <24999 then incomegroup =4;
if income >=25000 and income <34999 then incomegroup =5;
if income >=35000 and income <49999 then incomegroup =6;
if income >=50000 and income <74999 then incomegroup =7;
else if income >= 75000 then incomegroup =8;
run;
*formats;
proc format;
value frace
1 = 'Black'
2 = 'White'
3 = 'Asian'
4 = 'NativeAM'
5 = 'HawaiianPI'
6 = 'Multi'
7 = 'Other'
;
value **bleep**egroup
1 = '<20'
2 = '20-24'
3 = '25-29'
4 = '30-34'
5 = '35-39'
6 = '40+'
;
value fethnicity
1 = 'Non-Hisp'
2 = 'Hisp'
;
value fmarital
1 = 'Single'
2 = 'Separated'
3 = 'Divorced'
4 = 'Widowed'
5 = 'Married'
;
value fincomegroup
1 = 'Less than 10000'
2 = 'Between 10000-14999'
3 = 'Between 15000-19999'
4 = 'Between 20000-24999'
5 = 'Between 25000-34999'
6 = 'Between 35000-49999'
7 = 'Between 50000-74999'
8 = '75000 and up'
;
value feducat
0 = 'None'
1 = 'Less than HS'
2 = 'Less than HS'
3 = 'Less than HS'
4 = 'Less than HS'
5 = 'Less than HS'
6 = 'Less than HS'
7 = 'Less than HS'
8 = 'Less than HS'
9 = 'High School'
10 = 'High School'
11 = 'High School'
12 = 'High School'
13 = 'More than HS'
14 = 'More than HS'
15 = 'More than HS'
16 = 'More than HS'
17 = 'More than HS'
18 = 'More than HS'
22 = 'More than HS'
;
run;
*apply formats;
data mhs.identification2;
set mhs.identification2;
format Race frace. Agegroup **bleep**egroup. Ethnicity fEthnicity. Marital fMarital. Income fIncomegroup. Educat fEducat.;
run;
*desciptive analyses;
proc freq data=mph.identificationformats;
tables county2;
run;
proc freq data=mhs.identificationformats;
tables race;
run;
proc freq data=mhs.identification2;
tables agegroup;
run;
proc freq data=mph.identificationformats;
tables ethnicity;
run;
proc freq data=mhs.identificationformats;
tables marital;
run;
proc freq data=mhs.identification2;
tables incomegroup;
run;
proc freq data=mph.identificationformats;
tables educat;
run;
For variables like race, the format applied correctly and appear as 'Black', "White", etc in the results viewer. However, for my agegroup and incomegroup variables, the output simply appears as "1", "2", "3", and so on. Is this because I created those variables as an extension of other variables?
Show the log, specifically for the format code.
1. I suspect your format name for the age groups is not valid which is why that one doesn't work.
value **bleep**egroup 1 = '<20' 2 = '20-24' 3 = '25-29' 4 = '30-34' 5 = '35-39' 6 = '40+' ;
2. You erase your coding for age group when you create identification 2 data set again. Your'e starting from identification which does not have the age groups and it destroys your previous age group.
data MHS.identification2; set MHS.identification; *<- this is your starting data set; if age < 20 then agegroup = 1; if age >= 20 and age < 25 then agegroup = 2; if age >= 25 and age < 30 then agegroup = 3; if age >= 30 and age < 35 then agegroup = 4; if age >=35 and age < 40 then agegroup = 5; if age >=40 then agegroup = 6; run; data MHS.identification2; *<-replaces identification2 dataset above, any work there is lost; set MHS.identification; if income < 10000 then incomegroup =1; if income >=10000 and income <14999 then incomegroup =2; if income >=15000 and income <19999 then incomegroup =3; if income >=20000 and income <24999 then incomegroup =4; if income >=25000 and income <34999 then incomegroup =5; if income >=35000 and income <49999 then incomegroup =6; if income >=50000 and income <74999 then incomegroup =7; else if income >= 75000 then incomegroup =8; run;
3. Avoid coding like this. It's a good way to make it very difficult to debug your code and find weird errors.
data mhs.identification2;
set mhs.identification2;
I suspect the log will answer why the income group doesn't work.
4. Income doesn't work because you're applying it to the income variable, not the incomegroup variable you created. Use the new variable and you should be fine.
@SM8 wrote:
Hello,
So I used formats to create labels for my output (proc freqs). However, it appears only the formats created using characters apply to the proc freq outputs. Formats where I used numbers fail to show up in the output and simply appear as the original value.
Here is the code I used:
data MHS.identification2;
set MHS.identification;
if age < 20 then agegroup = 1;
if age >= 20 and age < 25 then agegroup = 2;
if age >= 25 and age < 30 then agegroup = 3;
if age >= 30 and age < 35 then agegroup = 4;
if age >=35 and age < 40 then agegroup = 5;
if age >=40 then agegroup = 6;
run;
data MHS.identification2;
set MHS.identification;
if income < 10000 then incomegroup =1;
if income >=10000 and income <14999 then incomegroup =2;
if income >=15000 and income <19999 then incomegroup =3;
if income >=20000 and income <24999 then incomegroup =4;
if income >=25000 and income <34999 then incomegroup =5;
if income >=35000 and income <49999 then incomegroup =6;
if income >=50000 and income <74999 then incomegroup =7;
else if income >= 75000 then incomegroup =8;
run;
*formats;
proc format;
value frace
1 = 'Black'
2 = 'White'
3 = 'Asian'
4 = 'NativeAM'
5 = 'HawaiianPI'
6 = 'Multi'
7 = 'Other'
;
value **bleep**egroup
1 = '<20'
2 = '20-24'
3 = '25-29'
4 = '30-34'
5 = '35-39'
6 = '40+'
;
value fethnicity
1 = 'Non-Hisp'
2 = 'Hisp'
;
value fmarital
1 = 'Single'
2 = 'Separated'
3 = 'Divorced'
4 = 'Widowed'
5 = 'Married'
;
value fincomegroup
1 = 'Less than 10000'
2 = 'Between 10000-14999'
3 = 'Between 15000-19999'
4 = 'Between 20000-24999'
5 = 'Between 25000-34999'
6 = 'Between 35000-49999'
7 = 'Between 50000-74999'
8 = '75000 and up'
;
value feducat
0 = 'None'
1 = 'Less than HS'
2 = 'Less than HS'
3 = 'Less than HS'
4 = 'Less than HS'
5 = 'Less than HS'
6 = 'Less than HS'
7 = 'Less than HS'
8 = 'Less than HS'
9 = 'High School'
10 = 'High School'
11 = 'High School'
12 = 'High School'
13 = 'More than HS'
14 = 'More than HS'
15 = 'More than HS'
16 = 'More than HS'
17 = 'More than HS'
18 = 'More than HS'
22 = 'More than HS'
;
run;
*apply formats;
data mhs.identification2;
set mhs.identification2;
format Race frace. Agegroup **bleep**egroup. Ethnicity fEthnicity. Marital fMarital. Income fIncomegroup. Educat fEducat.;
run;
*desciptive analyses;
data tmp1.identificationformats;
set tmp1.identification;
if county='LEXINGTON' then county2='Lexington';
if county='Lexington' then county2='Lexington';
if county='lexington' then county2='Lexington';
if county='RICHLAND' then county2='Richland';
if county='richland' then county2='Richland';
if county='Richland' then county2='Richland';
if county='SUMTER' then county2='Sumter';
if county='Sumter' then county2='Sumter';
if county='sumter' then county2='Sumter';
run;
proc freq data=mph.identificationformats;
tables county2;
run;
proc freq data=mhs.identificationformats;
tables race;
run;
proc freq data=mhs.identification2;
tables agegroup;
run;
proc freq data=mph.identificationformats;
tables ethnicity;
run;
proc freq data=mhs.identificationformats;
tables marital;
run;
proc freq data=mhs.identification2;
tables incomegroup;
run;
proc freq data=mph.identificationformats;
tables educat;
run;
For variables like race, the format applied correctly and appear as 'Black', "White", etc in the results viewer. However, for my agegroup and incomegroup variables, the output simply appears as "1", "2", "3", and so on. Is this because I created those variables as an extension of other variables?
Show the log, specifically for the format code.
1. I suspect your format name for the age groups is not valid which is why that one doesn't work.
value **bleep**egroup 1 = '<20' 2 = '20-24' 3 = '25-29' 4 = '30-34' 5 = '35-39' 6 = '40+' ;
2. You erase your coding for age group when you create identification 2 data set again. Your'e starting from identification which does not have the age groups and it destroys your previous age group.
data MHS.identification2; set MHS.identification; *<- this is your starting data set; if age < 20 then agegroup = 1; if age >= 20 and age < 25 then agegroup = 2; if age >= 25 and age < 30 then agegroup = 3; if age >= 30 and age < 35 then agegroup = 4; if age >=35 and age < 40 then agegroup = 5; if age >=40 then agegroup = 6; run; data MHS.identification2; *<-replaces identification2 dataset above, any work there is lost; set MHS.identification; if income < 10000 then incomegroup =1; if income >=10000 and income <14999 then incomegroup =2; if income >=15000 and income <19999 then incomegroup =3; if income >=20000 and income <24999 then incomegroup =4; if income >=25000 and income <34999 then incomegroup =5; if income >=35000 and income <49999 then incomegroup =6; if income >=50000 and income <74999 then incomegroup =7; else if income >= 75000 then incomegroup =8; run;
3. Avoid coding like this. It's a good way to make it very difficult to debug your code and find weird errors.
data mhs.identification2;
set mhs.identification2;
I suspect the log will answer why the income group doesn't work.
4. Income doesn't work because you're applying it to the income variable, not the incomegroup variable you created. Use the new variable and you should be fine.
@SM8 wrote:
Hello,
So I used formats to create labels for my output (proc freqs). However, it appears only the formats created using characters apply to the proc freq outputs. Formats where I used numbers fail to show up in the output and simply appear as the original value.
Here is the code I used:
data MHS.identification2;
set MHS.identification;
if age < 20 then agegroup = 1;
if age >= 20 and age < 25 then agegroup = 2;
if age >= 25 and age < 30 then agegroup = 3;
if age >= 30 and age < 35 then agegroup = 4;
if age >=35 and age < 40 then agegroup = 5;
if age >=40 then agegroup = 6;
run;
data MHS.identification2;
set MHS.identification;
if income < 10000 then incomegroup =1;
if income >=10000 and income <14999 then incomegroup =2;
if income >=15000 and income <19999 then incomegroup =3;
if income >=20000 and income <24999 then incomegroup =4;
if income >=25000 and income <34999 then incomegroup =5;
if income >=35000 and income <49999 then incomegroup =6;
if income >=50000 and income <74999 then incomegroup =7;
else if income >= 75000 then incomegroup =8;
run;
*formats;
proc format;
value frace
1 = 'Black'
2 = 'White'
3 = 'Asian'
4 = 'NativeAM'
5 = 'HawaiianPI'
6 = 'Multi'
7 = 'Other'
;
value **bleep**egroup
1 = '<20'
2 = '20-24'
3 = '25-29'
4 = '30-34'
5 = '35-39'
6 = '40+'
;
value fethnicity
1 = 'Non-Hisp'
2 = 'Hisp'
;
value fmarital
1 = 'Single'
2 = 'Separated'
3 = 'Divorced'
4 = 'Widowed'
5 = 'Married'
;
value fincomegroup
1 = 'Less than 10000'
2 = 'Between 10000-14999'
3 = 'Between 15000-19999'
4 = 'Between 20000-24999'
5 = 'Between 25000-34999'
6 = 'Between 35000-49999'
7 = 'Between 50000-74999'
8 = '75000 and up'
;
value feducat
0 = 'None'
1 = 'Less than HS'
2 = 'Less than HS'
3 = 'Less than HS'
4 = 'Less than HS'
5 = 'Less than HS'
6 = 'Less than HS'
7 = 'Less than HS'
8 = 'Less than HS'
9 = 'High School'
10 = 'High School'
11 = 'High School'
12 = 'High School'
13 = 'More than HS'
14 = 'More than HS'
15 = 'More than HS'
16 = 'More than HS'
17 = 'More than HS'
18 = 'More than HS'
22 = 'More than HS'
;
run;
*apply formats;
data mhs.identification2;
set mhs.identification2;
format Race frace. Agegroup **bleep**egroup. Ethnicity fEthnicity. Marital fMarital. Income fIncomegroup. Educat fEducat.;
run;
*desciptive analyses;
data tmp1.identificationformats;
set tmp1.identification;
if county='LEXINGTON' then county2='Lexington';
if county='Lexington' then county2='Lexington';
if county='lexington' then county2='Lexington';
if county='RICHLAND' then county2='Richland';
if county='richland' then county2='Richland';
if county='Richland' then county2='Richland';
if county='SUMTER' then county2='Sumter';
if county='Sumter' then county2='Sumter';
if county='sumter' then county2='Sumter';
run;
proc freq data=mph.identificationformats;
tables county2;
run;
proc freq data=mhs.identificationformats;
tables race;
run;
proc freq data=mhs.identification2;
tables agegroup;
run;
proc freq data=mph.identificationformats;
tables ethnicity;
run;
proc freq data=mhs.identificationformats;
tables marital;
run;
proc freq data=mhs.identification2;
tables incomegroup;
run;
proc freq data=mph.identificationformats;
tables educat;
run;
For variables like race, the format applied correctly and appear as 'Black', "White", etc in the results viewer. However, for my agegroup and incomegroup variables, the output simply appears as "1", "2", "3", and so on. Is this because I created those variables as an extension of other variables?
data MHS.identification2;
set MHS.identification;
if age < 20 then agegroup = 1;
if age >= 20 and age < 25 then agegroup = 2;
if age >= 25 and age < 30 then agegroup = 3;
if age >= 30 and age < 35 then agegroup = 4;
if age >=35 and age < 40 then agegroup = 5;
if age >=40 then agegroup = 6;
if income < 10000 then incomegroup =1;
if income >=10000 and income <14999 then incomegroup =2;
if income >=15000 and income <19999 then incomegroup =3;
if income >=20000 and income <24999 then incomegroup =4;
if income >=25000 and income <34999 then incomegroup =5;
if income >=35000 and income <49999 then incomegroup =6;
if income >=50000 and income <74999 then incomegroup =7;
else if income >= 75000 then incomegroup =8;
run;
Make it one data step. Or create an informat and use INPUT() instead which is generally more efficient.
You could also have created your formats directly from your ranges instead of creating groups and then labels for said groups.
proc format;
value fmt_agegroup
low-<20 = '<20'
20-24 = '20-24'
25-29 = '25-29'
30-34 = '30-34'
35-39 = '35-39'
40 - high = '40+'
other= 'CHECKME';
run;
proc freq data=sashelp.class;
table age;
format age fmt_agegroup.;
run;
And I probably would have combined all the PROC FREQ into one or two rather than individual ones.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.