BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Noomen
Fluorite | Level 6

Hi everyone,

I have a data set with about 100 variables and each of them has a different format because the codes in the variables are different.

Problem: I don't know in advance how many variables I have to report on and each variable needs its related format. So if I have to query 25 variables, I have to apply 25 formats in my final output.

 

I have a solution that works and I reproduced it partially below

Let's assume that ALLSTATES is my large file and and I have to work on the subset called myfile.

 

I wrote a macro based on if statements to assign each formats but this solution has a cost: an extra file is created at each if statement.So if have 60 variables, I will get 60 extra files.

 

Is there a better way to do this and to avoid to get so many files?

 

DATA ALLSTATES;
CANADIAN="ON"; NORTH_E="NJ"; WEST_C="WA"; output;
CANADIAN="QC"; NORTH_E="NY"; WEST_C="CA"; output;
CANADIAN="MB"; NORTH_E="VE"; WEST_C="NE"; output; run;

 

proc format ;
VALUE $CAN (NOTSORTED)
"ON"="ONTARIO"
"QC"="QUEBEC"
"MB"="MANITOBA"; run;

proc format ;
VALUE $NORTHE (NOTSORTED)
"NJ"="NEWJERSEY"
"NY"="NEWYORK"
"VE"="VERMONT"; run;

proc format ;
VALUE $WEST (NOTSORTED)
"WA"="WASHINGTON"
"CA"="CALIFORNIA"
"NE"="NEVADA"; run;

 


DATA myfile ; set ALLSTATES;
drop west_c; run;

 

%macro formatmyvars(ds);
%local dsid ;
%let dsid = %sysfunc(open(&ds));
%if %sysfunc(varnum(&dsid, CANADIAN)) > 0 %then %do;
data myfile2; set myfile;
newv_CANADIAN=put(CANADIAN,$CAN.);
run;
%end;
%else %do; data myfile2; set myfile;run; %end;

%if %sysfunc(varnum(&dsid, NORTH_E)) > 0 %then %do;
data myfile3; set myfile2;
newv_NORTH_E=put(NORTH_E,$NORTHE.);
run;
%end;
%else %do;
data myfile3; set myfile2;run;
%end;

/*and so on*/
%mend formatmyvars;

%formatmyvars(myfile );

1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User
If your input data set is formatted, the formats carry through to the end.

So format your main data set with all the variables FIRST and it'll carry through to the end.

ie

data main;
set original;
format var1 fVar1. var2 fVar2. ......;
run;

Now you select the 35 you want:

data subset;
set main;
keep <list of variables to keep>;
run;

The formats will flow through and you don't have to selectively apply it.

If that won't work for some reason, then you can also create a master table with a list of the variables and formats.

Variable Format
Var1 fVar1
Var2 fVar2
....


Then query it to build your format statement dynamically.

proc sql noprint;
select cats(" ", variable, catt(format, ".")) into :format_list separated by " "
from master_list;
quit;

data final_data;
set ....;
format &format_list;
run;

View solution in original post

6 REPLIES 6
Reeza
Super User
  • Any reason to not format all the variables at the start and then it should carry through?
  • Do you actually need to use a PUT() conversion though, or is applying the format enough, because those are two very different problems? If the end goal is a formatted export file, using a format would be fine for example. 
  • Do you have a lookup table that maps each variable to it's format? If not, you may need to create that to drive the rest of this.

@Noomen wrote:

Hi everyone,

I have a data set with about 100 variables and each of them has a different format because the codes in the variables are different.

Problem: I don't know in advance how many variables I have to report on and each variable needs its related format. So if I have to query 25 variables, I have to apply 25 formats in my final output.

 

I have a solution that works and I reproduced it partially below

Let's assume that ALLSTATES is my large file and and I have to work on the subset called myfile.

 

I wrote a macro based on if statements to assign each formats but this solution has a cost: an extra file is created at each if statement.So if have 60 variables, I will get 60 extra files.

 

Is there a better way to do this and to avoid to get so many files?

 

DATA ALLSTATES;
CANADIAN="ON"; NORTH_E="NJ"; WEST_C="WA"; output;
CANADIAN="QC"; NORTH_E="NY"; WEST_C="CA"; output;
CANADIAN="MB"; NORTH_E="VE"; WEST_C="NE"; output; run;

 

proc format ;
VALUE $CAN (NOTSORTED)
"ON"="ONTARIO"
"QC"="QUEBEC"
"MB"="MANITOBA"; run;

proc format ;
VALUE $NORTHE (NOTSORTED)
"NJ"="NEWJERSEY"
"NY"="NEWYORK"
"VE"="VERMONT"; run;

proc format ;
VALUE $WEST (NOTSORTED)
"WA"="WASHINGTON"
"CA"="CALIFORNIA"
"NE"="NEVADA"; run;

 


DATA myfile ; set ALLSTATES;
drop west_c; run;

 

%macro formatmyvars(ds);
%local dsid ;
%let dsid = %sysfunc(open(&ds));
%if %sysfunc(varnum(&dsid, CANADIAN)) > 0 %then %do;
data myfile2; set myfile;
newv_CANADIAN=put(CANADIAN,$CAN.);
run;
%end;
%else %do; data myfile2; set myfile;run; %end;

%if %sysfunc(varnum(&dsid, NORTH_E)) > 0 %then %do;
data myfile3; set myfile2;
newv_NORTH_E=put(NORTH_E,$NORTHE.);
run;
%end;
%else %do;
data myfile3; set myfile2;run;
%end;

/*and so on*/
%mend formatmyvars;

%formatmyvars(myfile );


 

Noomen
Fluorite | Level 6

Any reason to not format all the variables at the start and then it should carry through?

the reason is I cannot work with formatted variables. I work with codes and I apply a format at the end.

For example, for a variable called client_nat, 1 means Foreign Client, 2 means US client, etc.

For reporting purposes, I have to apply the formats at the last stage so people can understand the report.

 

Do you have a lookup table that maps each variable to it's format? If not, you may need to create that to drive the rest of this.

No I don't. I've never used one.

Tom
Super User Tom
Super User

I don't understand what the issue is.  Just attach all of the formats when you define the dataset.

Then when you go use it the they are already attached.

 

Because the formats are not stored in the data that means you also have to have the formats defined already and include the catalog where they live in the FMTSEARCH option path.

So run this code once to make the dataset and format catalog.

libname mylib 'myfolder';
proc format cat=mylib.formats ;
  ....
run;
data mylib.mydataset ;
  ....
  format var1 fmta. var2 fmtb. .... ;
run;

Then when you want to use it:

libname mylib 'myfolder';
options append=(fmtsearch=(mylib.formats));
proc means data=mylib.mydataset;
   var var1;
run;
Noomen
Fluorite | Level 6

the problem is exactly here:

if I have a request at 9:30 am for 35 variables, then I have to apply 35 formats here:

data mylib.mydataset ;
....
format var1 fmta. var2 fmtb. .... var35 fmtxyz;      <-I am trying to automate this task.

 

and if I have a request for at 11:30 am for 25 variables, then I have to apply 25 formats here:

data mylib.mydataset ;
....
format var1 fmta. var2 fmtb. .... var25 fmty; <-I am trying to automate this task.

and so on.

 

Tom
Super User Tom
Super User

I don't understand.  If it is the same dataset why are you having to tell it over and over again what formats to use with each variable?

 

If the issue is that you are reading different files then just keep the same line of code every time.  Put it in a file somewhere so you can include it or copy and paste it.  If that causes extra variables to add just keep the ones you want.

data eleven_am;
  infile 'eleven_am.txt' ;
  input a b g h ;
  format a a. b b. ...... z z.;
  keep a--h ;
run;
Reeza
Super User
If your input data set is formatted, the formats carry through to the end.

So format your main data set with all the variables FIRST and it'll carry through to the end.

ie

data main;
set original;
format var1 fVar1. var2 fVar2. ......;
run;

Now you select the 35 you want:

data subset;
set main;
keep <list of variables to keep>;
run;

The formats will flow through and you don't have to selectively apply it.

If that won't work for some reason, then you can also create a master table with a list of the variables and formats.

Variable Format
Var1 fVar1
Var2 fVar2
....


Then query it to build your format statement dynamically.

proc sql noprint;
select cats(" ", variable, catt(format, ".")) into :format_list separated by " "
from master_list;
quit;

data final_data;
set ....;
format &format_list;
run;

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 1063 views
  • 0 likes
  • 3 in conversation