BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
SM8
Obsidian | Level 7 SM8
Obsidian | Level 7

Hello,

 

So I used formats to create labels for my output (proc freqs). However, it appears only the formats created using characters apply to the proc freq outputs. Formats where I used numbers fail to show up in the output and simply appear as the original value.

 

Here is the code I used:

 

data MHS.identification2;

set MHS.identification;

if age < 20 then agegroup = 1;

if age >= 20 and age < 25 then agegroup = 2;

if age >= 25 and age < 30 then agegroup = 3;

if age >= 30 and age < 35 then agegroup = 4;

if age >=35 and age < 40 then agegroup = 5;

if age >=40 then agegroup = 6;

run;

 

data MHS.identification2;

set MHS.identification;

if income < 10000 then incomegroup =1;

if income >=10000 and income <14999 then incomegroup =2;

if income >=15000 and income <19999 then incomegroup =3;

if income >=20000 and income <24999 then incomegroup =4;

if income >=25000 and income <34999 then incomegroup =5;

if income >=35000 and income <49999 then incomegroup =6;

if income >=50000 and income <74999 then incomegroup =7;

else if income >= 75000 then incomegroup =8;

run;

 

 

*formats;

proc format;

value frace

1 = 'Black'

2 = 'White'

3 = 'Asian'

4 = 'NativeAM'

5 = 'HawaiianPI'

6 = 'Multi'

7 = 'Other'

 

;

value **bleep**egroup

1 = '<20'

2 = '20-24'

3 = '25-29'

4 = '30-34'

5 = '35-39'

6 = '40+'

 

;

value fethnicity

1 = 'Non-Hisp'

2 = 'Hisp'

 

;

value fmarital

1 = 'Single'

2 = 'Separated'

3 = 'Divorced'

4 = 'Widowed'

5 = 'Married'

 

;

value fincomegroup

1 = 'Less than 10000'

2 = 'Between 10000-14999'

3 = 'Between 15000-19999'

4 = 'Between 20000-24999'

5 = 'Between 25000-34999'

6 = 'Between 35000-49999'

7 = 'Between 50000-74999'

8 = '75000 and up'

 

;

value feducat

0 = 'None'

1 = 'Less than HS'

2 = 'Less than HS'

3 = 'Less than HS'

4 = 'Less than HS'

5 = 'Less than HS'

6 = 'Less than HS'

7 = 'Less than HS'

8 = 'Less than HS'

9 = 'High School'

10 = 'High School'

11 = 'High School'

12 = 'High School'

13 = 'More than HS'

14 = 'More than HS'

15 = 'More than HS'

16 = 'More than HS'

17 = 'More than HS'

18 = 'More than HS'

22 = 'More than HS'

;

run;

 

*apply formats;

data mhs.identification2;

set mhs.identification2;

format Race frace. Agegroup **bleep**egroup. Ethnicity fEthnicity. Marital fMarital. Income fIncomegroup. Educat fEducat.;

run;

 

*desciptive analyses;

 

proc freq data=mph.identificationformats;

tables county2;

run;

proc freq data=mhs.identificationformats;

tables race;

run;

proc freq data=mhs.identification2;

tables agegroup;

run;

proc freq data=mph.identificationformats;

tables ethnicity;

run;

proc freq data=mhs.identificationformats;

tables marital;

run;

proc freq data=mhs.identification2;

tables incomegroup;

run;

proc freq data=mph.identificationformats;

tables educat;

run;

 

 

For variables like race, the format applied correctly and appear as 'Black', "White", etc in the results viewer. However, for my agegroup and incomegroup variables, the output simply appears as "1", "2", "3", and so on. Is this because I created those variables as an extension of other variables?

1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User

Show the log, specifically for the format code.

 

1. I suspect your format name for the age groups is not valid which is why that one doesn't work. 

value **bleep**egroup

1 = '<20'

2 = '20-24'

3 = '25-29'

4 = '30-34'

5 = '35-39'

6 = '40+'

 

;

2. You erase your coding for age group when you create identification 2 data set again. Your'e starting from identification which does not have the age groups and it destroys your previous age group. 

 

data MHS.identification2;

set MHS.identification; *<- this is your starting data set;

if age < 20 then agegroup = 1;

if age >= 20 and age < 25 then agegroup = 2;

if age >= 25 and age < 30 then agegroup = 3;

if age >= 30 and age < 35 then agegroup = 4;

if age >=35 and age < 40 then agegroup = 5;

if age >=40 then agegroup = 6;

run;

 

data MHS.identification2; *<-replaces identification2 dataset above, any work there is lost;

set MHS.identification;

if income < 10000 then incomegroup =1;

if income >=10000 and income <14999 then incomegroup =2;

if income >=15000 and income <19999 then incomegroup =3;

if income >=20000 and income <24999 then incomegroup =4;

if income >=25000 and income <34999 then incomegroup =5;

if income >=35000 and income <49999 then incomegroup =6;

if income >=50000 and income <74999 then incomegroup =7;

else if income >= 75000 then incomegroup =8;

run;

 

3. Avoid coding like this. It's a good way to make it very difficult to debug your code and find weird errors.

data mhs.identification2;

set mhs.identification2;

I suspect the log will answer why the income group doesn't work. 

4. Income doesn't work because you're applying it to the income variable, not the incomegroup variable you created. Use the new variable and you should be fine. 


@SM8 wrote:

Hello,

 

So I used formats to create labels for my output (proc freqs). However, it appears only the formats created using characters apply to the proc freq outputs. Formats where I used numbers fail to show up in the output and simply appear as the original value.

 

Here is the code I used:

 

data MHS.identification2;

set MHS.identification;

if age < 20 then agegroup = 1;

if age >= 20 and age < 25 then agegroup = 2;

if age >= 25 and age < 30 then agegroup = 3;

if age >= 30 and age < 35 then agegroup = 4;

if age >=35 and age < 40 then agegroup = 5;

if age >=40 then agegroup = 6;

run;

 

data MHS.identification2;

set MHS.identification;

if income < 10000 then incomegroup =1;

if income >=10000 and income <14999 then incomegroup =2;

if income >=15000 and income <19999 then incomegroup =3;

if income >=20000 and income <24999 then incomegroup =4;

if income >=25000 and income <34999 then incomegroup =5;

if income >=35000 and income <49999 then incomegroup =6;

if income >=50000 and income <74999 then incomegroup =7;

else if income >= 75000 then incomegroup =8;

run;

 

 

*formats;

proc format;

value frace

1 = 'Black'

2 = 'White'

3 = 'Asian'

4 = 'NativeAM'

5 = 'HawaiianPI'

6 = 'Multi'

7 = 'Other'

 

;

value **bleep**egroup

1 = '<20'

2 = '20-24'

3 = '25-29'

4 = '30-34'

5 = '35-39'

6 = '40+'

 

;

value fethnicity

1 = 'Non-Hisp'

2 = 'Hisp'

 

;

value fmarital

1 = 'Single'

2 = 'Separated'

3 = 'Divorced'

4 = 'Widowed'

5 = 'Married'

 

;

value fincomegroup

1 = 'Less than 10000'

2 = 'Between 10000-14999'

3 = 'Between 15000-19999'

4 = 'Between 20000-24999'

5 = 'Between 25000-34999'

6 = 'Between 35000-49999'

7 = 'Between 50000-74999'

8 = '75000 and up'

 

;

value feducat

0 = 'None'

1 = 'Less than HS'

2 = 'Less than HS'

3 = 'Less than HS'

4 = 'Less than HS'

5 = 'Less than HS'

6 = 'Less than HS'

7 = 'Less than HS'

8 = 'Less than HS'

9 = 'High School'

10 = 'High School'

11 = 'High School'

12 = 'High School'

13 = 'More than HS'

14 = 'More than HS'

15 = 'More than HS'

16 = 'More than HS'

17 = 'More than HS'

18 = 'More than HS'

22 = 'More than HS'

;

run;

 

*apply formats;

data mhs.identification2;

set mhs.identification2;

format Race frace. Agegroup **bleep**egroup. Ethnicity fEthnicity. Marital fMarital. Income fIncomegroup. Educat fEducat.;

run;

 

*desciptive analyses;

data tmp1.identificationformats;

set tmp1.identification;

if county='LEXINGTON' then county2='Lexington';

if county='Lexington' then county2='Lexington';

if county='lexington' then county2='Lexington';

if county='RICHLAND' then county2='Richland';

if county='richland' then county2='Richland';

if county='Richland' then county2='Richland';

if county='SUMTER' then county2='Sumter';

if county='Sumter' then county2='Sumter';

if county='sumter' then county2='Sumter';

run;

proc freq data=mph.identificationformats;

tables county2;

run;

proc freq data=mhs.identificationformats;

tables race;

run;

proc freq data=mhs.identification2;

tables agegroup;

run;

proc freq data=mph.identificationformats;

tables ethnicity;

run;

proc freq data=mhs.identificationformats;

tables marital;

run;

proc freq data=mhs.identification2;

tables incomegroup;

run;

proc freq data=mph.identificationformats;

tables educat;

run;

 

 

For variables like race, the format applied correctly and appear as 'Black', "White", etc in the results viewer. However, for my agegroup and incomegroup variables, the output simply appears as "1", "2", "3", and so on. Is this because I created those variables as an extension of other variables?


 

View solution in original post

3 REPLIES 3
Reeza
Super User

Show the log, specifically for the format code.

 

1. I suspect your format name for the age groups is not valid which is why that one doesn't work. 

value **bleep**egroup

1 = '<20'

2 = '20-24'

3 = '25-29'

4 = '30-34'

5 = '35-39'

6 = '40+'

 

;

2. You erase your coding for age group when you create identification 2 data set again. Your'e starting from identification which does not have the age groups and it destroys your previous age group. 

 

data MHS.identification2;

set MHS.identification; *<- this is your starting data set;

if age < 20 then agegroup = 1;

if age >= 20 and age < 25 then agegroup = 2;

if age >= 25 and age < 30 then agegroup = 3;

if age >= 30 and age < 35 then agegroup = 4;

if age >=35 and age < 40 then agegroup = 5;

if age >=40 then agegroup = 6;

run;

 

data MHS.identification2; *<-replaces identification2 dataset above, any work there is lost;

set MHS.identification;

if income < 10000 then incomegroup =1;

if income >=10000 and income <14999 then incomegroup =2;

if income >=15000 and income <19999 then incomegroup =3;

if income >=20000 and income <24999 then incomegroup =4;

if income >=25000 and income <34999 then incomegroup =5;

if income >=35000 and income <49999 then incomegroup =6;

if income >=50000 and income <74999 then incomegroup =7;

else if income >= 75000 then incomegroup =8;

run;

 

3. Avoid coding like this. It's a good way to make it very difficult to debug your code and find weird errors.

data mhs.identification2;

set mhs.identification2;

I suspect the log will answer why the income group doesn't work. 

4. Income doesn't work because you're applying it to the income variable, not the incomegroup variable you created. Use the new variable and you should be fine. 


@SM8 wrote:

Hello,

 

So I used formats to create labels for my output (proc freqs). However, it appears only the formats created using characters apply to the proc freq outputs. Formats where I used numbers fail to show up in the output and simply appear as the original value.

 

Here is the code I used:

 

data MHS.identification2;

set MHS.identification;

if age < 20 then agegroup = 1;

if age >= 20 and age < 25 then agegroup = 2;

if age >= 25 and age < 30 then agegroup = 3;

if age >= 30 and age < 35 then agegroup = 4;

if age >=35 and age < 40 then agegroup = 5;

if age >=40 then agegroup = 6;

run;

 

data MHS.identification2;

set MHS.identification;

if income < 10000 then incomegroup =1;

if income >=10000 and income <14999 then incomegroup =2;

if income >=15000 and income <19999 then incomegroup =3;

if income >=20000 and income <24999 then incomegroup =4;

if income >=25000 and income <34999 then incomegroup =5;

if income >=35000 and income <49999 then incomegroup =6;

if income >=50000 and income <74999 then incomegroup =7;

else if income >= 75000 then incomegroup =8;

run;

 

 

*formats;

proc format;

value frace

1 = 'Black'

2 = 'White'

3 = 'Asian'

4 = 'NativeAM'

5 = 'HawaiianPI'

6 = 'Multi'

7 = 'Other'

 

;

value **bleep**egroup

1 = '<20'

2 = '20-24'

3 = '25-29'

4 = '30-34'

5 = '35-39'

6 = '40+'

 

;

value fethnicity

1 = 'Non-Hisp'

2 = 'Hisp'

 

;

value fmarital

1 = 'Single'

2 = 'Separated'

3 = 'Divorced'

4 = 'Widowed'

5 = 'Married'

 

;

value fincomegroup

1 = 'Less than 10000'

2 = 'Between 10000-14999'

3 = 'Between 15000-19999'

4 = 'Between 20000-24999'

5 = 'Between 25000-34999'

6 = 'Between 35000-49999'

7 = 'Between 50000-74999'

8 = '75000 and up'

 

;

value feducat

0 = 'None'

1 = 'Less than HS'

2 = 'Less than HS'

3 = 'Less than HS'

4 = 'Less than HS'

5 = 'Less than HS'

6 = 'Less than HS'

7 = 'Less than HS'

8 = 'Less than HS'

9 = 'High School'

10 = 'High School'

11 = 'High School'

12 = 'High School'

13 = 'More than HS'

14 = 'More than HS'

15 = 'More than HS'

16 = 'More than HS'

17 = 'More than HS'

18 = 'More than HS'

22 = 'More than HS'

;

run;

 

*apply formats;

data mhs.identification2;

set mhs.identification2;

format Race frace. Agegroup **bleep**egroup. Ethnicity fEthnicity. Marital fMarital. Income fIncomegroup. Educat fEducat.;

run;

 

*desciptive analyses;

data tmp1.identificationformats;

set tmp1.identification;

if county='LEXINGTON' then county2='Lexington';

if county='Lexington' then county2='Lexington';

if county='lexington' then county2='Lexington';

if county='RICHLAND' then county2='Richland';

if county='richland' then county2='Richland';

if county='Richland' then county2='Richland';

if county='SUMTER' then county2='Sumter';

if county='Sumter' then county2='Sumter';

if county='sumter' then county2='Sumter';

run;

proc freq data=mph.identificationformats;

tables county2;

run;

proc freq data=mhs.identificationformats;

tables race;

run;

proc freq data=mhs.identification2;

tables agegroup;

run;

proc freq data=mph.identificationformats;

tables ethnicity;

run;

proc freq data=mhs.identificationformats;

tables marital;

run;

proc freq data=mhs.identification2;

tables incomegroup;

run;

proc freq data=mph.identificationformats;

tables educat;

run;

 

 

For variables like race, the format applied correctly and appear as 'Black', "White", etc in the results viewer. However, for my agegroup and incomegroup variables, the output simply appears as "1", "2", "3", and so on. Is this because I created those variables as an extension of other variables?


 

SM8
Obsidian | Level 7 SM8
Obsidian | Level 7
Thank you so much! I followed your advice and it worked. However, I still am a little confused about the 2nd point you made. Can you explain the way to do it without destroying the first variable I created? I got around it just now by creating an entire new dataset for agegroup itself, but I'm sure there is a simpler way? Thanks again!
Reeza
Super User
data MHS.identification2;

set MHS.identification;

if age < 20 then agegroup = 1;

if age >= 20 and age < 25 then agegroup = 2;

if age >= 25 and age < 30 then agegroup = 3;

if age >= 30 and age < 35 then agegroup = 4;

if age >=35 and age < 40 then agegroup = 5;

if age >=40 then agegroup = 6;




if income < 10000 then incomegroup =1;

if income >=10000 and income <14999 then incomegroup =2;

if income >=15000 and income <19999 then incomegroup =3;

if income >=20000 and income <24999 then incomegroup =4;

if income >=25000 and income <34999 then incomegroup =5;

if income >=35000 and income <49999 then incomegroup =6;

if income >=50000 and income <74999 then incomegroup =7;

else if income >= 75000 then incomegroup =8;

run;

Make it one data step. Or create an informat and use INPUT() instead which is generally more efficient. 

 

You could also have created your formats directly from your ranges instead of creating groups and then labels for said groups.

 

proc format;
value fmt_agegroup

low-<20 = '<20'

20-24 = '20-24'

25-29 = '25-29'

30-34 = '30-34'

35-39 = '35-39'

40 - high = '40+'
other= 'CHECKME';

run;

proc freq data=sashelp.class;
table age;
format age fmt_agegroup.;
run;


 

And I probably would have combined all the PROC FREQ into one or two rather than individual ones. 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 558 views
  • 1 like
  • 2 in conversation