I have a data set below. I want to format all categorical variables that have numbers (variables starting with "i" are all indicators (1="yes" and 0="no")). The picture below only shows six of those variables (idem, ipsychrx, icnsrx, incoven, iatypical and ibenzo). But I have 47 of them in total. How to I format them without necessarily creating proc format for each variable? And how do I create a new four level variable type from the variables 'iconven' and 'iatypical' that have values 0="neither", 1="conventional", 2="atypical only", 3="both"?
Hi @JUMMY,
As has been mentioned, variable lists make it easy to associate a format with several variables. In your case the list i: using the common prefix "i" and the colon operator is ideal. How could there be a shorter way to refer to all variables whose names start with "i"?
So, you define your formats (with the FORMAT procedure) once ...
proc format;
value yesno
0='no'
1='yes'
.=' ';
value type
0='neither'
1='conventional'
2='atypical only'
3='both'
.=' ';
run;
... and then (if desired) associate them with variables using the FORMAT statement:
data want;
set have;
if n(iatypical, iconven)=2 then type=2*iatypical+iconven;
format i: yesno. type type.;
run;
I put the FORMAT statement into a data step only because a new variable, TYPE, had to be created (not to be confused with the format of the same name). Otherwise, I would have used PROC DATASETS (with a MODIFY statement and the above FORMAT statement) to avoid unnecessary reading and writing of data if only changes to a dataset's "header information" (such as formats and labels) had been required.
@JUMMY wrote:
I have a data set below. I want to format all categorical variables that have numbers (variables starting with "i" are all indicators (1="yes" and 0="no")). The picture below only shows six of those variables (idem, ipsychrx, icnsrx, incoven, iatypical and ibenzo). But I have 47 of them in total. How to I format them without necessarily creating proc format for each variable? And how do I create a new four level variable type from the variables 'iconven' and 'iatypical' that have values 0="neither", 1="conventional", 2="atypical only", 3="both"?
Applying a format is different than creating a format. You can create a format once and apply that format to multiple variables in various ways.
To apply the format, you can use a variable list with the format.
Format iFIRSTVAR —iLastVar yesNofmt.;
Different ways of referencing a variable list.
https://blogs.sas.com/content/iml/2018/05/29/6-easy-ways-to-specify-a-list-of-variables-in-sas.html
Please post data as text not images, to work with your data we’d have to type it out.
It is NEVER proper to post data as a picture. Pictures do not reveal
and they force anyone who wants to help you to type data off the screenshot, which is
Please post example data as a data step with datalines. My second footnote leads you to a macro which does the conversion from a dataset to a data step automatically. See my third footnote for advice on posting code in a reliable manner.
Hi @JUMMY,
As has been mentioned, variable lists make it easy to associate a format with several variables. In your case the list i: using the common prefix "i" and the colon operator is ideal. How could there be a shorter way to refer to all variables whose names start with "i"?
So, you define your formats (with the FORMAT procedure) once ...
proc format;
value yesno
0='no'
1='yes'
.=' ';
value type
0='neither'
1='conventional'
2='atypical only'
3='both'
.=' ';
run;
... and then (if desired) associate them with variables using the FORMAT statement:
data want;
set have;
if n(iatypical, iconven)=2 then type=2*iatypical+iconven;
format i: yesno. type type.;
run;
I put the FORMAT statement into a data step only because a new variable, TYPE, had to be created (not to be confused with the format of the same name). Otherwise, I would have used PROC DATASETS (with a MODIFY statement and the above FORMAT statement) to avoid unnecessary reading and writing of data if only changes to a dataset's "header information" (such as formats and labels) had been required.
Just insert a few lines into the datastep that I suggested:
data want;
set have;
array a[*] stpamt sfpamt sppamt srpamt;
array l[*] l_stpamt l_sfpamt l_sppamt l_srpamt;
array r[*] r_stpamt r_sfpamt r_sppamt r_srpamt;
do j=1 to dim(a);
if a[j]> 0 then l[j]=log(a[j]);
if a[j]>=0 then r[j]=sqrt(a[j]);
end;
if n(iatypical, iconven)=2 then type=2*iatypical+iconven;
format i: yesno. type type.;
drop j;
run;
This assumes that there are no variables a, j, l or r in dataset HAVE and also none of the eight newly created variables l_stpamt, ..., r_srpamt. If this assumption is violated, please use different names to avoid name conflicts.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.