Hi guys,
I have 22 datasets with different char variables and numeric variables.
I have to set formats to char variable according to CDISC standards. Each char variable of each dataset has its own format name if required. For example: "prcat", "method", etc.
Is there a way to let the format take directly the char variable values without specifying one by one?
This to avoid specify each variable value because there are many char variables and many possible values but also because the option "other" will take values I do not know depending on what the clinician will write.
Moreover the format the variables will take is exactly their value like for example proc format ... "Chemotherapy" = "Chemotherapy" "Transplant" = "Transplant", ....
Can anyone help me please?
I am not sure what you mean. At first it sounded as if you wanted a way to generate a FORMAT statement where the NAME of the variable was the same as the NAME of the format.
format prcat prcat. method method. ;
That should be easy to do from the information available from PROC CONTENTS.
Example:
proc copy inlib=sashelp outlib=work;
select class cars;
run;
proc contents data=work._all_ noprint out=contents;
run;
filename code temp;
data _null_;
set contents end=eof;
where type=2 and memname in ('CARS' 'CLASS');
by memname;
file code;
if _n_=1 then put 'proc datasets nolist lib=' libname ';';
if first.memname then put ' modify ' memname ';' / ' format ';
put ' ' name '$' name +(-1) '.' ;
if last.memname then put ' ;' / ' run;';
if eof then put 'quit;';
run;
%include code / source2;
93 %include code / source2;
NOTE: %INCLUDE (level 1) file CODE is file /saswork/SAS_workE26C0001B716_odaws02-usw2.oda.sas.com/#LN00128.
94 +proc datasets nolist lib=WORK ;
95 + modify CARS ;
96 + format
97 + DriveTrain $DriveTrain.
98 + Make $Make.
99 + Model $Model.
100 + Origin $Origin.
101 + Type $Type.
102 + ;
103 + run;
NOTE: MODIFY was successful for WORK.CARS.DATA.
104 + modify CLASS ;
105 + format
106 + Name $Name.
107 + Sex $Sex.
108 + ;
109 + run;
NOTE: MODIFY was successful for WORK.CLASS.DATA.
110 +quit;
NOTE: PROCEDURE DATASETS used (Total process time):
real time 0.00 seconds
cpu time 0.01 seconds
But then it shifted to looking like the request was to CREATE a format definition based on the VALUES of the variable. But I could not figure out how you could distinguish the random text entered when the value was "OTHER" from the fixed strings that were provided as possible answer to the person that collected the original data. But ignoring that then you should be able to do that by selecting the distinct values of the variables and use it to either generate PROC FORMAT code or a dataset in the right form so that if could be used with the CNTLIN= option of PROC FORMAT.
Can you show a small example dataset and the results you would want for that dataset?
I am not sure what you mean. At first it sounded as if you wanted a way to generate a FORMAT statement where the NAME of the variable was the same as the NAME of the format.
format prcat prcat. method method. ;
That should be easy to do from the information available from PROC CONTENTS.
Example:
proc copy inlib=sashelp outlib=work;
select class cars;
run;
proc contents data=work._all_ noprint out=contents;
run;
filename code temp;
data _null_;
set contents end=eof;
where type=2 and memname in ('CARS' 'CLASS');
by memname;
file code;
if _n_=1 then put 'proc datasets nolist lib=' libname ';';
if first.memname then put ' modify ' memname ';' / ' format ';
put ' ' name '$' name +(-1) '.' ;
if last.memname then put ' ;' / ' run;';
if eof then put 'quit;';
run;
%include code / source2;
93 %include code / source2;
NOTE: %INCLUDE (level 1) file CODE is file /saswork/SAS_workE26C0001B716_odaws02-usw2.oda.sas.com/#LN00128.
94 +proc datasets nolist lib=WORK ;
95 + modify CARS ;
96 + format
97 + DriveTrain $DriveTrain.
98 + Make $Make.
99 + Model $Model.
100 + Origin $Origin.
101 + Type $Type.
102 + ;
103 + run;
NOTE: MODIFY was successful for WORK.CARS.DATA.
104 + modify CLASS ;
105 + format
106 + Name $Name.
107 + Sex $Sex.
108 + ;
109 + run;
NOTE: MODIFY was successful for WORK.CLASS.DATA.
110 +quit;
NOTE: PROCEDURE DATASETS used (Total process time):
real time 0.00 seconds
cpu time 0.01 seconds
But then it shifted to looking like the request was to CREATE a format definition based on the VALUES of the variable. But I could not figure out how you could distinguish the random text entered when the value was "OTHER" from the fixed strings that were provided as possible answer to the person that collected the original data. But ignoring that then you should be able to do that by selecting the distinct values of the variables and use it to either generate PROC FORMAT code or a dataset in the right form so that if could be used with the CNTLIN= option of PROC FORMAT.
Can you show a small example dataset and the results you would want for that dataset?
I agree with @Tom that several details are missing.
Your bit here:
@NewUsrStat wrote:
...
Moreover the format the variables will take is exactly their value like for example proc format ... "Chemotherapy" = "Chemotherapy" "Transplant" = "Transplant", ....
If you have a data set of values that you expect to encounter you can use that data to make a CNTLIN data set for Proc format. Here is an example using the SASHELP.CLASS data set that you should have.
/* select all the males from the data set and create
a format that will show that text as the label and "Other"
for the females
*/
data example_cntlin;
/* the END = option is to allow conditional creation
of the OTHER statement options as the last entry of the format*/
set sashelp.class (where=(sex='M')) end=lastone;
/* key variables that a Proc Format CNTLIN requires are
Fmtname start (end if there is range of values) label
type, C for character here, and since we want to use the
OTHER option the HLO variable which may contain other things
*/
fmtname='F_to_other';
start=name;
label=name;
type='C';
output;
if lastone then do;
/* that is a capital O not a zero*/
hlo='O';
/* special text for the Start/end values with Other*/
start='**OTHER**';
label='Other';
output;
end;
drop name sex age height weight;
run;
/* use the data set to create the format*/
proc format cntlin=example_cntlin;
run;
/* demonstrate how to use the format created*/
proc print data=sashelp.class noobs;
var name;
format name $F_to_other.;
run;
If you have any questions about what special values in which special variables used by Proc Format for a CNTLIN data set it is pretty easy if you know the options you need. Write a very small Proc format step with the options you need be they ranges, invalues (informats) or whatever. Add on the Proc Format statement the option CNTLOUT=somedataset. The procedure will create a data set with the variables needed and the options.
Caveat: If you have multiple variables in in a single data set that you want to work with you must make sure that the final data set is sorted by the FMTNAME variable as otherwise Proc Format will create and overwrite a bunch of basically one value versions of the formats. It might be best to create one CNTLIN data set per set of values/variable for simplicity sake. Keep the sets in a permanent library. Suggest also writing the Formats to a permanent library using the LIBRARY= option and then adding that library to the FMTSEARCH path so you don't have to recreate the formats each session.
If your clinicians are anything like the ones I worked with you may need to make sure that the spelling is consistent because a value of "Xyz" the format is looking for will not match "xyz". I often fixed that by creating custom informats that used the UPCASE option that would treat all the text as uppercase before comparing to value on the informat and assigning the proper expected values.
Something like this:
Proc format; invalue $sometext (upcase) 'ABC'='Abc' 'PDQ'='Pdq' 'JACK DANIELS' = 'Jack Daniels' other=_same_; run; data example; input text $sometext.; datalines; abc abC ABc pdq somethiing else ;
The CNTLOUT option would show where the UPCASE gets set in the control data set.
Note the use of _SAME_ to copy any unexpected text.
The CNTLIN data set would want the START value to be all uppercase.
If you have known supposedly fixed set of values, such as Clinic names, you could use the _ERROR_ option in the INVALUE to throw an error when an unexpected value gets entered into a data set so the LOG would throw invalid data messages. If you find that one of your clinicians abbreviates or misspells a value consistently you could add that to the Invalue as format has not problems taking multiple input values to the same output.
I used this frequently because one of our contractors would "update" their data system so that things that had been established to report a value as M or F for female would do things such as spell the word out or add a / and the Spanish equivalent. I could modify my informats to accommodate such and keep the original correct values in our data.
I always thought Proc Format was one the hidden gems of data management as long a single variables were involved.
Dive into keynotes, announcements and breakthroughs on demand.
Explore Now →SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.