BookmarkSubscribeRSS Feed
VinnyR
Calcite | Level 5

I am trying to compare 2 datasets: the compare result is as:

 

                                                             The COMPARE Procedure                                                             
                                                 Comparison of WORK.PRODCM1 with WORK.ALLSET1                                                  
                                                                (Method=EXACT)                                                                 
                                                                                                                                               
                                                    Value Comparison Results for Variables                                                     
                                                                                                                                               
                                          __________________________________________________________                                           
                                                     ||  Other Reason, Specify                                                                 
                                                     ||  Base Value           Compare Value                                                    
                                                 Obs ||  CMDISCOT              cmdiscot                                                        
                                           ________  ||  ___________________+  ___________________+                                            
                                                     ||                                                                                        
                                                 32  ||  UNKNOWN               UNKNOWN
                                                       
                                                 42  ||  UNKNOWN               UNKNOWN
                                                       
                                                126  ||  UNKNOWN               UNKNOWN
                                                       
                                                926  ||  PTWASINREFRACTORYDIS  	PTWASINREFRACTORYDI                                            
                                                939  ||  PTWASINREFRACTORYDIS  	PTWASINREFRACTORYDI

 

 

Now I had taken all care while writing my program to use, this code while getting data from raw dataset:

"if strip(upcase(mydcotsp)) in ("UNKNOWN" "UN" "UNKNOW" "UKNOWN") then cmdiscot="U";
else cmdiscot=strip(upcase(MYDCOTSP));"

 

and when i saw the compare issue I used this in both production and QC datasets before comparing: "cmdiscot=strip(compress(cmdiscot));"

 

The issue is that the raw data\ has the data as  "                    PT WAS IN REFRACTORY DISEASE STATE". that's correct when I click on the cell there is that space before the actual data in the cell starts.

 

How Can I get an exact match for this, programmatically ? No hard coding.

 

Any help / suggestion is greatly appreciated.

 

Thanks,

Vinny


 

 

 

3 REPLIES 3
Shmuel
Garnet | Level 18

Please post your full code, both steps where you create CMDISCOT variable and your compare code.

 

Post also a sample data with lines 32, 42, 126 etc. to check what value was in origin.

VinnyR
Calcite | Level 5

Hi Shmuel, please find my response below.

 

***Excerpts of the program(pulling data from raw datasets)****

 

data amyth1;
length cmtrt cmdecod cmcat cmscat cmbest cmdisc cmdiscot cmconf cmconfot $200;
set amyth;
if mytrt^="" or mytrt_product^="";
cmdecod=strip(mytrt_product);
cmtrt=strip(upcase(mytrt));
cmcat="PRIOR ANTI-MYELOMA REGIMEN";
if strip(upcase(mytype))="OTHER" and mytypot^="" then cmscat="OTHER" || " - " ||strip(upcase(mytypot));
else cmscat= strip(upcase(mytype));
cmlnkid=myregnum_raw;
if strip(upcase(myresp)) in ("UNKNOWN" "UN" "UNKNOW" "UKNOWN") then cmbest="U";
else cmbest=strip(upcase(myresp));
if strip(upcase(mydcreas)) in ("UNKNOWN" "UN" "UNKNOW" "UKNOWN") then cmdisc="U";
else cmdisc=strip(upcase(mydcreas));
if strip(upcase(mydcotsp)) in ("UNKNOWN" "UN" "UNKNOW" "UKNOWN") then cmdiscot="U";
else cmdiscot=strip(upcase(MYDCOTSP));
if strip(upcase(myconfirm)) in ("UNKNOWN" "UN" "UNKNOW" "UKNOWN") then cmconf="U";
else cmCONF=strip(upcase(myconfirm));
if strip(upcase(myconfot)) in ("UNKNOWN" "UN" "UNKNOW" "UKNOWN") then cmconfot="U";
else cmconfot=strip(upcase(myconfot));
run;

 

data postammr1;
length cmtrt cmdecod cmcat cmscat cmbest cmdisc cmdiscot $200;
set postammr;
if mytrt_product^="" or mytrt^="";
cmdecod=strip(mytrt_product);
cmtrt=strip(upcase(mytrt));
cmcat="POST DISEASE PROGRESSION ANTI-MM REGIMENS";
if strip(upcase(mytype))="OTHER" and mytypot^="" then cmscat="OTHER" || " - " ||strip(upcase(mytypot));
else cmscat= strip(upcase(mytype));
if strip(upcase(myresp)) in ("UNKNOWN" "UN" "UNKNOW" "UKNOWN") then cmbest="U";
else cmbest=strip(upcase(myresp));
if strip(upcase(mydcreas)) in ("UNKNOWN" "UN" "UNKNOW" "UKNOWN") then cmdisc="U";
else cmdisc=strip(upcase(mydcreas));
if strip(upcase(mydcotsp)) in ("UNKNOWN" "UN" "UNKNOW" "UKNOWN") then cmdiscot="U";
else cmdiscot=strip(upcase(MYDCOTSP));
run;

 

*****Setting all data from raw datasets*********;

data allset;
set amyth1(keep=subject cmstdtc cmendtc cmdecod cmtrt cmpgdtc cmcat cmscat cmlnkid cmbest cmdisc cmdiscot cmconf cmconfot)
postammr1(keep=subject cmstdtc cmendtc cmdecod cmtrt cmpgdtc cmcat cmscat cmbest cmdisc cmdiscot)
cmnd1(keep=subject cmstdtc cmendtc cmtrt cmcat) cm1(keep=subject cmstdtc cmendtc cmdecod cmtrt cmcat cmdose)
lymph1(keep=subject cmstdtc cmendtc
cmtrt cmcat cmpresp cmoccur cmdose) leuk1(keep=subject cmstdtc cmendtc cmtrt cmcat cmpresp cmoccur cmdose);
usubjid="CRB-402-" || strip(substr(subject,4,3)) || "-" || strip(substr(subject,7));
cmdiscot=strip(compress(cmdiscot));
run;

 

data cmprod;
merge prodcm(in=x1) supptr(in=x2);
by usubjid cmseq;
cmdiscot=strip(compress(cmdiscot));
run;

 

 

**********Preparing for COMPARE***********;

proc sort data=cmprod out=prodcm1(keep=usubjid cmdecod cmstdtc cmendtc cmbest cmdisc cmdiscot cmconf cmconfot cmlnkid );
by usubjid cmdecod cmstdtc cmendtc descending cmlnkid cmbest cmdisc cmdiscot cmconf cmconfot ;
where cmdecod^="";
run;

 

proc sort data=allset out=allset1/*(keep=usubjid cmdecod cmstdtc cmendtc cmbest cmdisc cmdiscot cmconf cmconfot cmlnkid )*/;
by usubjid cmdecod cmstdtc cmendtc descending cmlnkid cmbest cmdisc cmdiscot cmconf cmconfot ;
where cmdecod^="";
run;

 

proc compare base=prodcm1 compare=allset1;
run;

Shmuel
Garnet | Level 18

Does CMDISCOT variable already exist in row data creating ALLSET1 : amyth, postammr ?

If posotive, can it be "UNKNOWN" ?

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 3 replies
  • 893 views
  • 0 likes
  • 2 in conversation