I have a data set below. I want to format all categorical variables that have numbers (variables starting with "i" are all indicators (1="yes" and 0="no")). The picture below only shows six of those variables (idem, ipsychrx, icnsrx, incoven, iatypical and ibenzo). But I have 47 of them in total. How to I format them without necessarily creating proc format for each variable? And how do I create a new four level variable type from the variables 'iconven' and 'iatypical' that have values 0="neither", 1="conventional", 2="atypical only", 3="both"?
Hi @JUMMY,
As has been mentioned, variable lists make it easy to associate a format with several variables. In your case the list i: using the common prefix "i" and the colon operator is ideal. How could there be a shorter way to refer to all variables whose names start with "i"?
So, you define your formats (with the FORMAT procedure) once ...
proc format;
value yesno
0='no'
1='yes'
.=' ';
value type
0='neither'
1='conventional'
2='atypical only'
3='both'
.=' ';
run;
... and then (if desired) associate them with variables using the FORMAT statement:
data want;
set have;
if n(iatypical, iconven)=2 then type=2*iatypical+iconven;
format i: yesno. type type.;
run;
I put the FORMAT statement into a data step only because a new variable, TYPE, had to be created (not to be confused with the format of the same name). Otherwise, I would have used PROC DATASETS (with a MODIFY statement and the above FORMAT statement) to avoid unnecessary reading and writing of data if only changes to a dataset's "header information" (such as formats and labels) had been required.
@JUMMY wrote:
I have a data set below. I want to format all categorical variables that have numbers (variables starting with "i" are all indicators (1="yes" and 0="no")). The picture below only shows six of those variables (idem, ipsychrx, icnsrx, incoven, iatypical and ibenzo). But I have 47 of them in total. How to I format them without necessarily creating proc format for each variable? And how do I create a new four level variable type from the variables 'iconven' and 'iatypical' that have values 0="neither", 1="conventional", 2="atypical only", 3="both"?
Applying a format is different than creating a format. You can create a format once and apply that format to multiple variables in various ways.
To apply the format, you can use a variable list with the format.
Format iFIRSTVAR —iLastVar yesNofmt.;
Different ways of referencing a variable list.
https://blogs.sas.com/content/iml/2018/05/29/6-easy-ways-to-specify-a-list-of-variables-in-sas.html
Please post data as text not images, to work with your data we’d have to type it out.
It is NEVER proper to post data as a picture. Pictures do not reveal
and they force anyone who wants to help you to type data off the screenshot, which is
Please post example data as a data step with datalines. My second footnote leads you to a macro which does the conversion from a dataset to a data step automatically. See my third footnote for advice on posting code in a reliable manner.
Hi @JUMMY,
As has been mentioned, variable lists make it easy to associate a format with several variables. In your case the list i: using the common prefix "i" and the colon operator is ideal. How could there be a shorter way to refer to all variables whose names start with "i"?
So, you define your formats (with the FORMAT procedure) once ...
proc format;
value yesno
0='no'
1='yes'
.=' ';
value type
0='neither'
1='conventional'
2='atypical only'
3='both'
.=' ';
run;
... and then (if desired) associate them with variables using the FORMAT statement:
data want;
set have;
if n(iatypical, iconven)=2 then type=2*iatypical+iconven;
format i: yesno. type type.;
run;
I put the FORMAT statement into a data step only because a new variable, TYPE, had to be created (not to be confused with the format of the same name). Otherwise, I would have used PROC DATASETS (with a MODIFY statement and the above FORMAT statement) to avoid unnecessary reading and writing of data if only changes to a dataset's "header information" (such as formats and labels) had been required.
Just insert a few lines into the datastep that I suggested:
data want;
set have;
array a[*] stpamt sfpamt sppamt srpamt;
array l[*] l_stpamt l_sfpamt l_sppamt l_srpamt;
array r[*] r_stpamt r_sfpamt r_sppamt r_srpamt;
do j=1 to dim(a);
if a[j]> 0 then l[j]=log(a[j]);
if a[j]>=0 then r[j]=sqrt(a[j]);
end;
if n(iatypical, iconven)=2 then type=2*iatypical+iconven;
format i: yesno. type type.;
drop j;
run;
This assumes that there are no variables a, j, l or r in dataset HAVE and also none of the eight newly created variables l_stpamt, ..., r_srpamt. If this assumption is violated, please use different names to avoid name conflicts.
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.