BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.

Aim:

  1. Create user-defined formats for 'Questionnaire' domain using specifications already on a Microsoft Excel Sheet.
  2. Apply my formats to values on unrefined data to create industry-compliant values in the final dataset.

 

My Code Explained:

0. This is a program that I recycled with reference to 1 macro variable SDTM.

0A. Used %LET statement to define macro variable SDTM as 'QS'.

 

1. Imported metadata on the industry-compliant / 'Controlled Terminology' via Excel DBMS and PROC IMPORT.

1A. I renamed variables so that the 'unrefined' values ("start" / "end" in PROC FORMAT) are either 'sourcevar1' or 'sourcevar2'.       

1B.'Codelist' is just a reference to help me create super specific format names later in the program.

1C. 'SDTMCode' is eventually going to be the label / final value I want my user-defined formats to print out.

 

2. Imported metadata on the list of variables, length, variable type, variable 'label', and variable name needed in my final dataset 'QS' questionnaire via PROC IMPORT.

2A. Used 'Codelist' again to point to reference for the variables that will eventually need my user-defined formats for modification.

2B. This 'Codelist' corresponds to the metadata in Step 1's Codelist. Eventually intention is to merge these 2 datasets together based on mutual 'Codelist'.

 

3. Cleaned my 'imported' datasets from step 1 and step 2. Sorted these datasets by CodeList to make merging later easier.

 

4. Merged dataset containing variable metadata and dataset containing format metadata into 1 big data using DATA step / MERGE statement.

4A. Intent is to use this 'merged dataset' to create a dataset that I can use to create my user-defined formats via PROC FORMAT and CNTLIN.

4B. Merged dataset has these parts: start, end, 'label', variable name, variable length, variable type, CodeList (variable used to match both datasets).

 

5. I cleaned up the newly merged dataset so that PROC FORMAT and CNTLIN eventually have an easier time reading my user definitions into correctly defined formats.

5A. Format name was made by via compress(CodeList||"_"||Variable) because I need super-specific formats for my data.

5B. Either defined sourcevar2 as 'start' and 'end' or 'sourcevar1' as 'start' and 'end' values because my metadata was not as clean as I hoped.

5C. For 5B, yes the start and end values are equal because *most* of my data are character values that need a format applied to create either a 1) a *new* character value 2) a numeric value.

5D. Left-aligned 'SDTMCode' to create all the corresponding format labels.

5E. Defined type by either saying a variable with 'CHAR' type is 'C' or with 'NUM' type is 'N'. 

5F. 5A-5E. was all done to create clean user-defined formats.

 

6. Used PROC FORMAT and CNTLIN to take the dataset yielded in part 5 to create my *own* user-defined formats in an existing library 'LOOKUP.'

6A. Used PROC CATALOG C to look at my formats up close - seems fine.

6B. Used PROC FORMAT + CNTLOUT to look at *details* like MAX/MIN/FUZZ for my formats. *All* of my MAX value is 40.

 

7. Got this error when I ran the code: " For format VISITNUMPRJ5457_VISITNUM, this range is repeated, or values overlap: .-.."

7A. This still makes NO sense to me because for format VISITNUMPRJ5457_VISITNUM, the start and end values are *literally* the same and are *character* values so how could these values possibly overlap?

7B. When I omit the format from 7A, I notice several problems: 1) My start and end values get 'truncated' to 40 characters when I look at the FMTLIB output. 2) My 'label' also gets several truncated to 40 characters. This is *despite* the fact that I tried to override these lengths with maxselen=160 maxlablen=300 in my PROC FORMAT procedure.

7c. When I omit VISITNUMPRJ5457_VISITNUM format altogether, I get a BRAND NEW error that never popped up before when the user-defined formats were being made which is odd because this format *worked perfectly fine* when the VISITNUMPRJ5457_VISITNUM was in the picture:

ERROR: For format $ECOGORRES_QSSTRESC, this range is repeated, or values overlap: -.

 

Here's my code:

libname sdtmdata '/home/dimplepatelch0/5457c/sdtmdata';
filename reffile '/home/dimplepatelch0/5457c/sdtmdata/DimpleCode/RawData/PRJ5457C_SDTM_Domain_Specs.xls';

options validvarname=v7;
%let sdtm=QS;

proc import datafile=reffile
dbms=xls
out=work.fmtdata(Rename=(CodeListName=codelist Code__Uppercased_CRF_or_original=sourcevar1 Mixed_Case_Decode_or_value_for_f=sourcevar2 Decode__SDTM_Variable_Value_or_T=SDTMcode))
replace;
sheet='CT';
getnames=yes;
run;

proc import datafile=reffile
dbms=xls
out=work.fmtdata&sdtm(Rename=(Variable_Name=Variable Variable_Label=Label Controlled_Terms__Codelist_Or_Fo=CodeList))
replace;
sheet="&sdtm";
namerow=8;
datarow=9;
endrow=25;
run;

data fmtdata&sdtm.1;
	set work.fmtdata&sdtm;
	keep Variable Label Type Length CodeList;
run;

proc sort data=fmtdata&sdtm.1 out=fmtdata&sdtm.2;
	by CodeList;
run;

data fmtdataclean;
	set work.fmtdata;
	keep CodeList SourceVar1 SourceVar2 SDTMCode;
run;

proc sort data=fmtdataclean out=fmtdataclean2;
	by CodeList;
run;

data fmt&sdtm.clean;
	length CodeList $200;
	merge fmtdata&sdtm.2(in=x) fmtdataclean2(in=y);
	by CodeList;
	if x=1 and y=1;
run;

data work.fmtlib&sdtm.3;
	length FmtName $64 Start $160 End $160 Type $16;
	set fmt&sdtm.clean(drop=Length);
	fmtname=compress(CodeList||"_"||Variable);
	if CodeList in ("DOMAIN", "VISITPRJ5457", "YNULL", "QSTEST", "QSTESTCD")
	then do;
	start=sourcevar2;
	end=sourcevar2;
	end;
	else do;
	start=sourcevar1;
	end=sourcevar1;
	end;
	label=left(SDTMcode);
	if upcase(type) in ("CHAR") then
	type="C";
	else if upcase(type) in ("NUM") then
	type="N";
run;

proc sort data=work.fmtlib&sdtm.3 out=work.fmtlib&sdtm.4;
by FmtName;
	

proc format cntlin=work.fmtlib&sdtm.4 library=lookup maxselen=160 maxlablen=300;
run;

proc catalog c = lookup.formats;
contents;
run;

proc format library = lookup.formats
cntlout = cntlout;
run;

**Attached the dataset with ALL of the user-defined format definitions in 'SDTM.FMTQSCLEAN'; the other dataset ECOG is what I need to transform. 

 

 

 

*Now* onto the SECOND major problem which occurs when I try to apply my user-defined formats when the PROC FORMAT LIBRARY stuff sorta gets eked out!

 

data ecog5(drop=PatNum);
	length STUDYID $200 DOMAIN $2 USUBJID $200
	QSCAT $200 QSORRES $200 QSORRESU $200 QSSTRESC $200 
	QSSTRESN 8 QSSTRESU $200 QSSTAT $8 QSBLFL $2
	VISIT $200 QSDTC $19;
	set ecog1;
	STUDYID='PRJ5457C';
	DOMAIN='QS';
	USUBJID=PatNum;
	QSTESTCD=put(ECQSTESTCD1,$QSTESTCD_QSTESTCD.);
	QSTEST=put(ECQSTEST1,$QSTEST_QSTEST.);
	QSCAT=upcase("SCALES");
	QSORRES=ECOGGR;
	QSORRESU=" ";
	QSSTRESC=put(ECOGGR,$ECOGORRES_QSSTRESC.);
	QSSTRESN=input(QSORRES,3.);
	QSSTRESU=" ";
	if EGREC in ("No") and QSORRES eq " " then
	QSSTAT='NOT DONE';
	if upcase(FOLDERL) in ("SCREENING") then
	QSBLFL="Y";
	VISIT=upcase(FOLDERL);
/* 	VISITNUM=input(put(VISIT,$VISITNPRJ5457_VISITNUM.),3.); */
	if FOLDERDT ne " " then
	QSDTC=put(input(FOLDERDT,date10.),E8601DA10.);
	else do;
	QSDTC=put(input(ECOGDR,date10.),E8601DA10.);
	end;
run;

 

8. Look at the $QSTESTCD_QSTESTCD format: it absolutely does not work when I apply it to the variable ECQSTESTCD1 variable. The variable has the value "Eastern Cooperative Oncology Group Performance Status." When I look at $QSTESTCD_QSTESTCD via CNTLOUT, everything looks *perfect* for start, end, label (which in this case is "ECOG"), type, and length (4).

 

 However, when I *apply* this format using the PUT statement onto ECGSTESTCD1, I get this value instead: "East" , which means the label was entirely ignored, SAS is only reading 4 characters in my input variable, and SAS is outputting the wrong 4 characters, etc.

 

9. $VISITNPRJ5457_VISITNUM. is a format that I use to make numeric formats based on character 'start' and 'end' values. For instance, if my VISIT variable has the value 'SCREENING', then my format would create the numeric value 1. This works perfectly up until a row with the value 'UNSCHEDULED VISIT' hits. My format's start and end value is 'UNSCHEDULED VISIT, UNSCHEDULED LAB' that yields a numeric format 999; however, instead of 999, the output UNS is shown. Shouldn't the format yield 1) a NUMERIC value 2) read the start/end value as a *match* because the first 2 words, UNSCHEDULED VISIT, are equal?

Any and all suggestions are welcome! I'm trying so hard to understand the SAS theory / knowledge with program data vectors / compilation / execution and using SAS documentation, but I'm still stuck.

 

 

Please help!!!! A 1000 thanks to the AMAZING SAS community!!!

1 ACCEPTED SOLUTION

Accepted Solutions
blueskyxyz
Lapis Lazuli | Level 10
************ques1 20200122**************;
libname mysas "D:\W\Code_lib";
libname lookup "D:\W\Code_lib";

%let sdtm=QS;

data work.fmtlib&sdtm.3;
    length FmtName $64 Start $160 End $160 Type $16;
    set mysas.fmt&sdtm.clean(drop=Length);
    fmtname=compress(CodeList||"_"||Variable);
    if CodeList in ("DOMAIN", "VISITPRJ5457", "YNULL", "QSTEST", "QSTESTCD") then do;
	    start=sourcevar2;
	    end=sourcevar2;
    end;

    else do;
	    start=sourcevar1;
	    end=sourcevar1;
    end;

    label=left(SDTMcode);

    if upcase(type) in ("CHAR") then type="C";
    else if upcase(type) in ("NUM") then do;
		 if ifc(prxmatch('/^-?\d*\.?\d+\s*$/',sourcevar2),'Num','Char')='Num'  then  type="N";
		 else type="I";
	end;
run;

proc sort data=work.fmtlib&sdtm.3 out=work.fmtlib&sdtm.4;
by FmtName;
run;    

proc format cntlin=work.fmtlib&sdtm.4 library=lookup maxselen=160 maxlablen=300;
run;

proc catalog c = lookup.formats;
contents;
run;

proc format library = lookup.formats
cntlout = cntlout;
run;

Hi dimpz429 , attached the code, you can refer to.

 

you forget another type='I', please see the example as below, pay attention to the type of sex, which is same as VISITNUMPRJ5457

  CodeList

 VISITNUMPRJ5457

 

proc format;
    value $month
        "Jan"="01"
        "Feb"="02"
        "Mar"="03"
        "Apr"="04"
        "May"="05"
        "Jun"="06"
        "Jul"="07"
        "Aug"="08"
        "Sep"="09"
        "Oct"="10"
        "Nov"="11"
        "Dec"="12" 
        ;

    invalue sex
        "M"=1
        "F"=2
        ;

    value ny_a  /* type='I'-- type of start is character, type of start label is numeric*/
        1="Y"
        2="N"
        ;
run;

proc format library=work cntlout=fmt;
run;

123.JPG

View solution in original post

7 REPLIES 7
blueskyxyz
Lapis Lazuli | Level 10
could you offer part of dataset to let others to debug your code easily?
KachiM
Rhodochrosite | Level 12

Hi @anonymous_user,

 

From the ERROR message, you said there were no overlapping. It then leads to duplicate SRART values. I suggest that  between steps 5 and 6 add a Proc Sort procedure. The Intermediate data set created in Step 5, be sorted with the BY Value of START. You will know whether there is duplicate values of START. If there are duplicates, the same Sort Procedure can be use to get a new data set with NODUP option. The new data set can then be used for creating user-format.

 

Best wishes for success.

 

 

ballardw
Super User

When I look at your posted FMTQSCLEAN there are  NO format definitions. There are some things that might go into a format definition but no definitions.

 

I suggest posting one of the CNTLIN data sets actually used. Pick one that has one of the formats causing the error.

blueskyxyz
Lapis Lazuli | Level 10
************ques1 20200122**************;
libname mysas "D:\W\Code_lib";
libname lookup "D:\W\Code_lib";

%let sdtm=QS;

data work.fmtlib&sdtm.3;
    length FmtName $64 Start $160 End $160 Type $16;
    set mysas.fmt&sdtm.clean(drop=Length);
    fmtname=compress(CodeList||"_"||Variable);
    if CodeList in ("DOMAIN", "VISITPRJ5457", "YNULL", "QSTEST", "QSTESTCD") then do;
	    start=sourcevar2;
	    end=sourcevar2;
    end;

    else do;
	    start=sourcevar1;
	    end=sourcevar1;
    end;

    label=left(SDTMcode);

    if upcase(type) in ("CHAR") then type="C";
    else if upcase(type) in ("NUM") then do;
		 if ifc(prxmatch('/^-?\d*\.?\d+\s*$/',sourcevar2),'Num','Char')='Num'  then  type="N";
		 else type="I";
	end;
run;

proc sort data=work.fmtlib&sdtm.3 out=work.fmtlib&sdtm.4;
by FmtName;
run;    

proc format cntlin=work.fmtlib&sdtm.4 library=lookup maxselen=160 maxlablen=300;
run;

proc catalog c = lookup.formats;
contents;
run;

proc format library = lookup.formats
cntlout = cntlout;
run;

Hi dimpz429 , attached the code, you can refer to.

 

you forget another type='I', please see the example as below, pay attention to the type of sex, which is same as VISITNUMPRJ5457

  CodeList

 VISITNUMPRJ5457

 

proc format;
    value $month
        "Jan"="01"
        "Feb"="02"
        "Mar"="03"
        "Apr"="04"
        "May"="05"
        "Jun"="06"
        "Jul"="07"
        "Aug"="08"
        "Sep"="09"
        "Oct"="10"
        "Nov"="11"
        "Dec"="12" 
        ;

    invalue sex
        "M"=1
        "F"=2
        ;

    value ny_a  /* type='I'-- type of start is character, type of start label is numeric*/
        1="Y"
        2="N"
        ;
run;

proc format library=work cntlout=fmt;
run;

123.JPG

blueskyxyz
Lapis Lazuli | Level 10
*********ques 2************;

/*added by w*/
proc format library=work cntlin=fmtlib&sdtm.4;
run;


data ecog5(drop=PatNum);
	length STUDYID $200 DOMAIN $2 USUBJID $200
	QSCAT $200 QSORRES $200 QSORRESU $200 QSSTRESC $200 
	QSSTRESN 8 QSSTRESU $200 QSSTAT $8 QSBLFL $2
	VISIT $200 QSDTC $19;
	set mysas.ecog;
	STUDYID='PRJ5457C';
	DOMAIN='QS';
	USUBJID=PatNum;
	QSTESTCD=put(ECQSTESTCD1,$QSTESTCD_QSTESTCD.);
	QSTEST=put(ECQSTEST1,$QSTEST_QSTEST.);
	QSCAT=upcase("SCALES");
	QSORRES=ECOGGR;
	QSORRESU=" ";
	QSSTRESC=put(ECOGGR,ECOGORRES_QSSTRESC.);
	QSSTRESN=input(QSORRES,3.);
	QSSTRESU=" ";
	if EGREC in ("No") and QSORRES eq " " then
	QSSTAT='NOT DONE';
	if upcase(FOLDERL) in ("SCREENING") then
	QSBLFL="Y";
	VISIT=upcase(FOLDERL);
/* 	VISITNUM=input(put(upcase(VISIT),VISITNPRJ5457_VISITNUM.),3.);*/
 	if find(VISIT,'unsch','i') =0 then VISITNUM=input(upcase(VISIT),VISITNUMPRJ5457_VISITNUM.);   /*added by w*/
 
	if FOLDERDT ne " " then
	QSDTC=put(input(FOLDERDT,date10.),E8601DA10.);
	else do;
	QSDTC=put(input(ECOGDR,date10.),E8601DA10.);
	end;
run;
blueskyxyz
Lapis Lazuli | Level 10
*********ques 2************;
proc format library = lookup.formats
	cntlout = cntlout;
run; 

OPTIONS FMTSEARCH=(lookup); /*added by w*/

data ecog5(drop=PatNum);
	length STUDYID $200 DOMAIN $2 USUBJID $200
	QSCAT $200 QSORRES $200 QSORRESU $200 QSSTRESC $200 
	QSSTRESN 8 QSSTRESU $200 QSSTAT $8 QSBLFL $2
	VISIT $200 QSDTC $19;
	set mysas.ecog;
	STUDYID='PRJ5457C';
	DOMAIN='QS';
	USUBJID=PatNum;
	QSTESTCD=put(ECQSTESTCD1,$QSTESTCD_QSTESTCD.);
	QSTEST=put(ECQSTEST1,$QSTEST_QSTEST.);
	QSCAT=upcase("SCALES");
	QSORRES=ECOGGR;
	QSORRESU=" ";
	QSSTRESC=put(ECOGGR,ECOGORRES_QSSTRESC.);
	QSSTRESN=input(QSORRES,3.);
	QSSTRESU=" ";
	if EGREC in ("No") and QSORRES eq " " then
	QSSTAT='NOT DONE';
	if upcase(FOLDERL) in ("SCREENING") then
	QSBLFL="Y";
	VISIT=upcase(FOLDERL);
/* 	VISITNUM=input(put(upcase(VISIT),VISITNPRJ5457_VISITNUM.),3.);*/
 	if find(VISIT,'unsch','i') =0 then VISITNUM=input(upcase(VISIT),VISITNUMPRJ5457_VISITNUM.);   /*added by w*/
 
	if FOLDERDT ne " " then
	QSDTC=put(input(FOLDERDT,date10.),E8601DA10.);
	else do;
	QSDTC=put(input(ECOGDR,date10.),E8601DA10.);
	end;
run;

 

added:

OPTIONS FMTSEARCH=(lookup); /*added by w*/

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 7 replies
  • 3726 views
  • 1 like
  • 4 in conversation