Hi everyone,
I have a data set with about 100 variables and each of them has a different format because the codes in the variables are different.
Problem: I don't know in advance how many variables I have to report on and each variable needs its related format. So if I have to query 25 variables, I have to apply 25 formats in my final output.
I have a solution that works and I reproduced it partially below
Let's assume that ALLSTATES is my large file and and I have to work on the subset called myfile.
I wrote a macro based on if statements to assign each formats but this solution has a cost: an extra file is created at each if statement.So if have 60 variables, I will get 60 extra files.
Is there a better way to do this and to avoid to get so many files?
DATA ALLSTATES;
CANADIAN="ON"; NORTH_E="NJ"; WEST_C="WA"; output;
CANADIAN="QC"; NORTH_E="NY"; WEST_C="CA"; output;
CANADIAN="MB"; NORTH_E="VE"; WEST_C="NE"; output; run;
proc format ;
VALUE $CAN (NOTSORTED)
"ON"="ONTARIO"
"QC"="QUEBEC"
"MB"="MANITOBA"; run;
proc format ;
VALUE $NORTHE (NOTSORTED)
"NJ"="NEWJERSEY"
"NY"="NEWYORK"
"VE"="VERMONT"; run;
proc format ;
VALUE $WEST (NOTSORTED)
"WA"="WASHINGTON"
"CA"="CALIFORNIA"
"NE"="NEVADA"; run;
DATA myfile ; set ALLSTATES;
drop west_c; run;
%macro formatmyvars(ds);
%local dsid ;
%let dsid = %sysfunc(open(&ds));
%if %sysfunc(varnum(&dsid, CANADIAN)) > 0 %then %do;
data myfile2; set myfile;
newv_CANADIAN=put(CANADIAN,$CAN.);
run;
%end;
%else %do; data myfile2; set myfile;run; %end;
%if %sysfunc(varnum(&dsid, NORTH_E)) > 0 %then %do;
data myfile3; set myfile2;
newv_NORTH_E=put(NORTH_E,$NORTHE.);
run;
%end;
%else %do;
data myfile3; set myfile2;run;
%end;
/*and so on*/
%mend formatmyvars;
%formatmyvars(myfile );
@Noomen wrote:
Hi everyone,
I have a data set with about 100 variables and each of them has a different format because the codes in the variables are different.
Problem: I don't know in advance how many variables I have to report on and each variable needs its related format. So if I have to query 25 variables, I have to apply 25 formats in my final output.
I have a solution that works and I reproduced it partially below
Let's assume that ALLSTATES is my large file and and I have to work on the subset called myfile.
I wrote a macro based on if statements to assign each formats but this solution has a cost: an extra file is created at each if statement.So if have 60 variables, I will get 60 extra files.
Is there a better way to do this and to avoid to get so many files?
DATA ALLSTATES;
CANADIAN="ON"; NORTH_E="NJ"; WEST_C="WA"; output;
CANADIAN="QC"; NORTH_E="NY"; WEST_C="CA"; output;
CANADIAN="MB"; NORTH_E="VE"; WEST_C="NE"; output; run;
proc format ;
VALUE $CAN (NOTSORTED)
"ON"="ONTARIO"
"QC"="QUEBEC"
"MB"="MANITOBA"; run;proc format ;
VALUE $NORTHE (NOTSORTED)
"NJ"="NEWJERSEY"
"NY"="NEWYORK"
"VE"="VERMONT"; run;proc format ;
VALUE $WEST (NOTSORTED)
"WA"="WASHINGTON"
"CA"="CALIFORNIA"
"NE"="NEVADA"; run;
DATA myfile ; set ALLSTATES;
drop west_c; run;
%macro formatmyvars(ds);
%local dsid ;
%let dsid = %sysfunc(open(&ds));
%if %sysfunc(varnum(&dsid, CANADIAN)) > 0 %then %do;
data myfile2; set myfile;
newv_CANADIAN=put(CANADIAN,$CAN.);
run;
%end;
%else %do; data myfile2; set myfile;run; %end;%if %sysfunc(varnum(&dsid, NORTH_E)) > 0 %then %do;
data myfile3; set myfile2;
newv_NORTH_E=put(NORTH_E,$NORTHE.);
run;
%end;
%else %do;
data myfile3; set myfile2;run;
%end;/*and so on*/
%mend formatmyvars;%formatmyvars(myfile );
Any reason to not format all the variables at the start and then it should carry through?
the reason is I cannot work with formatted variables. I work with codes and I apply a format at the end.
For example, for a variable called client_nat, 1 means Foreign Client, 2 means US client, etc.
For reporting purposes, I have to apply the formats at the last stage so people can understand the report.
Do you have a lookup table that maps each variable to it's format? If not, you may need to create that to drive the rest of this.
No I don't. I've never used one.
I don't understand what the issue is. Just attach all of the formats when you define the dataset.
Then when you go use it the they are already attached.
Because the formats are not stored in the data that means you also have to have the formats defined already and include the catalog where they live in the FMTSEARCH option path.
So run this code once to make the dataset and format catalog.
libname mylib 'myfolder';
proc format cat=mylib.formats ;
....
run;
data mylib.mydataset ;
....
format var1 fmta. var2 fmtb. .... ;
run;
Then when you want to use it:
libname mylib 'myfolder';
options append=(fmtsearch=(mylib.formats));
proc means data=mylib.mydataset;
var var1;
run;
the problem is exactly here:
if I have a request at 9:30 am for 35 variables, then I have to apply 35 formats here:
data mylib.mydataset ;
....
format var1 fmta. var2 fmtb. .... var35 fmtxyz; <-I am trying to automate this task.
and if I have a request for at 11:30 am for 25 variables, then I have to apply 25 formats here:
data mylib.mydataset ;
....
format var1 fmta. var2 fmtb. .... var25 fmty; <-I am trying to automate this task.
and so on.
I don't understand. If it is the same dataset why are you having to tell it over and over again what formats to use with each variable?
If the issue is that you are reading different files then just keep the same line of code every time. Put it in a file somewhere so you can include it or copy and paste it. If that causes extra variables to add just keep the ones you want.
data eleven_am;
infile 'eleven_am.txt' ;
input a b g h ;
format a a. b b. ...... z z.;
keep a--h ;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.