I have a very simple question: how can I label my variable's values, while keeping its values, in proc SQL?
For instance, I have in my data Gender, which is 0 and 1. In my regression output I want to display the meaning of 0 and 1 as Female and Male.
Here is my current code:
proc format;
value $sex
2 ='Female'
1='Male';
value $race
1 = 'White'
2 = 'Black'
3 = 'Other';
value $yes_no
0 = 'No'
1 = 'Yes' ;
run;
PROC SQL ;
CREATE TABLE WANT
AS SELECT distinct *,
input(SEX_CD,F8.) as numeric_sex format=z8. ,
input(RACE_CD,F8.) as numeric_race format=z8. ,
(case
when UPPER(t1.'Description'n) CONTAINS UPPER('INCOME') then 1
else 0
end) AS HAS_INCOME,
case
when 'ID'n = "Yes" then 1
else 0
end) AS HAS_ID
FROM HAVE
;
quit ;
Thanks!
proc format;
value sex
2 ='Female'
1='Male';
value race
1 = 'White'
2 = 'Black'
3 = 'Other';
value yes_no
0 = 'No'
1 = 'Yes' ;
run;
PROC SQL ;
CREATE TABLE WANT
AS SELECT distinct *,
input(SEX_CD,sex.) as numeric_sex format=z8. ,
input(RACE_CD,race.) as numeric_race format=z8. ,
(case
when UPPER(t1.'Description'n) CONTAINS UPPER('INCOME') then 1
else 0
end) AS HAS_INCOME format=yes_no.,
case
when 'ID'n = "Yes" then 1
else 0
end) AS HAS_ID format=yes_no.
FROM HAVE
;
quit ;
Google proc format
You have already created formats, which is a good first step. You need a slight adjustment to the code, however. For a numeric variable such as GENDER taking on values of 0 and 1, the format name should not begin with a dollar sign.
You don't need to label the values when running PROC SQL to create a new data set. You would only need to label the variables later, as part of a procedure that prints a report. That would be the right time to apply a format. If you apply the format as part of PROC SQL, then you need to have the format available any time you try to use the data set. As a general rule, that's a nuisance.
Apply the format to the variables in the Regression procedure.
After fixing the numeric/character confusion you currently have with the format definitions.
proc format;
value sex
2 ='Female'
1='Male';
value race
1 = 'White'
2 = 'Black'
3 = 'Other';
value yes_no
0 = 'No'
1 = 'Yes' ;
run;
PROC SQL ;
CREATE TABLE WANT
AS SELECT distinct *,
input(SEX_CD,sex.) as numeric_sex format=z8. ,
input(RACE_CD,race.) as numeric_race format=z8. ,
(case
when UPPER(t1.'Description'n) CONTAINS UPPER('INCOME') then 1
else 0
end) AS HAS_INCOME format=yes_no.,
case
when 'ID'n = "Yes" then 1
else 0
end) AS HAS_ID format=yes_no.
FROM HAVE
;
quit ;
A trick we used to use was to make two format catalogs. One with formats that displayed just the decodes. And a second catalog with the same formats but that this time the decode includes the raw value. Then depending on which you want to use you can just change the FMTSEARCH option.
proc format lib=mylib.formats ;
value sex
1='Male'
2='Female'
;
run;
proc format lib=mylib.cformats ;
value sex
1='1 Male'
2='2 Female'
;
run;
So when developing your code use the CFORMATS catalog and you can see what values to use in your code to select values.
Then for your final "pretty" report use the regular FORMATS catalog and the number no longer appear.
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.