I am trying to compare 2 datasets: the compare result is as:
The COMPARE Procedure Comparison of WORK.PRODCM1 with WORK.ALLSET1 (Method=EXACT) Value Comparison Results for Variables __________________________________________________________ || Other Reason, Specify || Base Value Compare Value Obs || CMDISCOT cmdiscot ________ || ___________________+ ___________________+ || 32 || UNKNOWN UNKNOWN 42 || UNKNOWN UNKNOWN 126 || UNKNOWN UNKNOWN 926 || PTWASINREFRACTORYDIS PTWASINREFRACTORYDI 939 || PTWASINREFRACTORYDIS PTWASINREFRACTORYDI |
Now I had taken all care while writing my program to use, this code while getting data from raw dataset:
"if strip(upcase(mydcotsp)) in ("UNKNOWN" "UN" "UNKNOW" "UKNOWN") then cmdiscot="U";
else cmdiscot=strip(upcase(MYDCOTSP));"
and when i saw the compare issue I used this in both production and QC datasets before comparing: "cmdiscot=strip(compress(cmdiscot));"
The issue is that the raw data\ has the data as " PT WAS IN REFRACTORY DISEASE STATE". that's correct when I click on the cell there is that space before the actual data in the cell starts.
How Can I get an exact match for this, programmatically ? No hard coding.
Any help / suggestion is greatly appreciated.
Thanks,
Vinny
Please post your full code, both steps where you create CMDISCOT variable and your compare code.
Post also a sample data with lines 32, 42, 126 etc. to check what value was in origin.
Hi Shmuel, please find my response below.
***Excerpts of the program(pulling data from raw datasets)****
data amyth1;
length cmtrt cmdecod cmcat cmscat cmbest cmdisc cmdiscot cmconf cmconfot $200;
set amyth;
if mytrt^="" or mytrt_product^="";
cmdecod=strip(mytrt_product);
cmtrt=strip(upcase(mytrt));
cmcat="PRIOR ANTI-MYELOMA REGIMEN";
if strip(upcase(mytype))="OTHER" and mytypot^="" then cmscat="OTHER" || " - " ||strip(upcase(mytypot));
else cmscat= strip(upcase(mytype));
cmlnkid=myregnum_raw;
if strip(upcase(myresp)) in ("UNKNOWN" "UN" "UNKNOW" "UKNOWN") then cmbest="U";
else cmbest=strip(upcase(myresp));
if strip(upcase(mydcreas)) in ("UNKNOWN" "UN" "UNKNOW" "UKNOWN") then cmdisc="U";
else cmdisc=strip(upcase(mydcreas));
if strip(upcase(mydcotsp)) in ("UNKNOWN" "UN" "UNKNOW" "UKNOWN") then cmdiscot="U";
else cmdiscot=strip(upcase(MYDCOTSP));
if strip(upcase(myconfirm)) in ("UNKNOWN" "UN" "UNKNOW" "UKNOWN") then cmconf="U";
else cmCONF=strip(upcase(myconfirm));
if strip(upcase(myconfot)) in ("UNKNOWN" "UN" "UNKNOW" "UKNOWN") then cmconfot="U";
else cmconfot=strip(upcase(myconfot));
run;
data postammr1;
length cmtrt cmdecod cmcat cmscat cmbest cmdisc cmdiscot $200;
set postammr;
if mytrt_product^="" or mytrt^="";
cmdecod=strip(mytrt_product);
cmtrt=strip(upcase(mytrt));
cmcat="POST DISEASE PROGRESSION ANTI-MM REGIMENS";
if strip(upcase(mytype))="OTHER" and mytypot^="" then cmscat="OTHER" || " - " ||strip(upcase(mytypot));
else cmscat= strip(upcase(mytype));
if strip(upcase(myresp)) in ("UNKNOWN" "UN" "UNKNOW" "UKNOWN") then cmbest="U";
else cmbest=strip(upcase(myresp));
if strip(upcase(mydcreas)) in ("UNKNOWN" "UN" "UNKNOW" "UKNOWN") then cmdisc="U";
else cmdisc=strip(upcase(mydcreas));
if strip(upcase(mydcotsp)) in ("UNKNOWN" "UN" "UNKNOW" "UKNOWN") then cmdiscot="U";
else cmdiscot=strip(upcase(MYDCOTSP));
run;
*****Setting all data from raw datasets*********;
data allset;
set amyth1(keep=subject cmstdtc cmendtc cmdecod cmtrt cmpgdtc cmcat cmscat cmlnkid cmbest cmdisc cmdiscot cmconf cmconfot)
postammr1(keep=subject cmstdtc cmendtc cmdecod cmtrt cmpgdtc cmcat cmscat cmbest cmdisc cmdiscot)
cmnd1(keep=subject cmstdtc cmendtc cmtrt cmcat) cm1(keep=subject cmstdtc cmendtc cmdecod cmtrt cmcat cmdose)
lymph1(keep=subject cmstdtc cmendtc
cmtrt cmcat cmpresp cmoccur cmdose) leuk1(keep=subject cmstdtc cmendtc cmtrt cmcat cmpresp cmoccur cmdose);
usubjid="CRB-402-" || strip(substr(subject,4,3)) || "-" || strip(substr(subject,7));
cmdiscot=strip(compress(cmdiscot));
run;
data cmprod;
merge prodcm(in=x1) supptr(in=x2);
by usubjid cmseq;
cmdiscot=strip(compress(cmdiscot));
run;
**********Preparing for COMPARE***********;
proc sort data=cmprod out=prodcm1(keep=usubjid cmdecod cmstdtc cmendtc cmbest cmdisc cmdiscot cmconf cmconfot cmlnkid );
by usubjid cmdecod cmstdtc cmendtc descending cmlnkid cmbest cmdisc cmdiscot cmconf cmconfot ;
where cmdecod^="";
run;
proc sort data=allset out=allset1/*(keep=usubjid cmdecod cmstdtc cmendtc cmbest cmdisc cmdiscot cmconf cmconfot cmlnkid )*/;
by usubjid cmdecod cmstdtc cmendtc descending cmlnkid cmbest cmdisc cmdiscot cmconf cmconfot ;
where cmdecod^="";
run;
proc compare base=prodcm1 compare=allset1;
run;
Does CMDISCOT variable already exist in row data creating ALLSET1 : amyth, postammr ?
If posotive, can it be "UNKNOWN" ?
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.