BookmarkSubscribeRSS Feed
VinnyR
Calcite | Level 5

I am trying to compare 2 datasets: the compare result is as:

 

                                                             The COMPARE Procedure                                                             
                                                 Comparison of WORK.PRODCM1 with WORK.ALLSET1                                                  
                                                                (Method=EXACT)                                                                 
                                                                                                                                               
                                                    Value Comparison Results for Variables                                                     
                                                                                                                                               
                                          __________________________________________________________                                           
                                                     ||  Other Reason, Specify                                                                 
                                                     ||  Base Value           Compare Value                                                    
                                                 Obs ||  CMDISCOT              cmdiscot                                                        
                                           ________  ||  ___________________+  ___________________+                                            
                                                     ||                                                                                        
                                                 32  ||  UNKNOWN               UNKNOWN
                                                       
                                                 42  ||  UNKNOWN               UNKNOWN
                                                       
                                                126  ||  UNKNOWN               UNKNOWN
                                                       
                                                926  ||  PTWASINREFRACTORYDIS  	PTWASINREFRACTORYDI                                            
                                                939  ||  PTWASINREFRACTORYDIS  	PTWASINREFRACTORYDI

 

 

Now I had taken all care while writing my program to use, this code while getting data from raw dataset:

"if strip(upcase(mydcotsp)) in ("UNKNOWN" "UN" "UNKNOW" "UKNOWN") then cmdiscot="U";
else cmdiscot=strip(upcase(MYDCOTSP));"

 

and when i saw the compare issue I used this in both production and QC datasets before comparing: "cmdiscot=strip(compress(cmdiscot));"

 

The issue is that the raw data\ has the data as  "                    PT WAS IN REFRACTORY DISEASE STATE". that's correct when I click on the cell there is that space before the actual data in the cell starts.

 

How Can I get an exact match for this, programmatically ? No hard coding.

 

Any help / suggestion is greatly appreciated.

 

Thanks,

Vinny


 

 

 

3 REPLIES 3
Shmuel
Garnet | Level 18

Please post your full code, both steps where you create CMDISCOT variable and your compare code.

 

Post also a sample data with lines 32, 42, 126 etc. to check what value was in origin.

VinnyR
Calcite | Level 5

Hi Shmuel, please find my response below.

 

***Excerpts of the program(pulling data from raw datasets)****

 

data amyth1;
length cmtrt cmdecod cmcat cmscat cmbest cmdisc cmdiscot cmconf cmconfot $200;
set amyth;
if mytrt^="" or mytrt_product^="";
cmdecod=strip(mytrt_product);
cmtrt=strip(upcase(mytrt));
cmcat="PRIOR ANTI-MYELOMA REGIMEN";
if strip(upcase(mytype))="OTHER" and mytypot^="" then cmscat="OTHER" || " - " ||strip(upcase(mytypot));
else cmscat= strip(upcase(mytype));
cmlnkid=myregnum_raw;
if strip(upcase(myresp)) in ("UNKNOWN" "UN" "UNKNOW" "UKNOWN") then cmbest="U";
else cmbest=strip(upcase(myresp));
if strip(upcase(mydcreas)) in ("UNKNOWN" "UN" "UNKNOW" "UKNOWN") then cmdisc="U";
else cmdisc=strip(upcase(mydcreas));
if strip(upcase(mydcotsp)) in ("UNKNOWN" "UN" "UNKNOW" "UKNOWN") then cmdiscot="U";
else cmdiscot=strip(upcase(MYDCOTSP));
if strip(upcase(myconfirm)) in ("UNKNOWN" "UN" "UNKNOW" "UKNOWN") then cmconf="U";
else cmCONF=strip(upcase(myconfirm));
if strip(upcase(myconfot)) in ("UNKNOWN" "UN" "UNKNOW" "UKNOWN") then cmconfot="U";
else cmconfot=strip(upcase(myconfot));
run;

 

data postammr1;
length cmtrt cmdecod cmcat cmscat cmbest cmdisc cmdiscot $200;
set postammr;
if mytrt_product^="" or mytrt^="";
cmdecod=strip(mytrt_product);
cmtrt=strip(upcase(mytrt));
cmcat="POST DISEASE PROGRESSION ANTI-MM REGIMENS";
if strip(upcase(mytype))="OTHER" and mytypot^="" then cmscat="OTHER" || " - " ||strip(upcase(mytypot));
else cmscat= strip(upcase(mytype));
if strip(upcase(myresp)) in ("UNKNOWN" "UN" "UNKNOW" "UKNOWN") then cmbest="U";
else cmbest=strip(upcase(myresp));
if strip(upcase(mydcreas)) in ("UNKNOWN" "UN" "UNKNOW" "UKNOWN") then cmdisc="U";
else cmdisc=strip(upcase(mydcreas));
if strip(upcase(mydcotsp)) in ("UNKNOWN" "UN" "UNKNOW" "UKNOWN") then cmdiscot="U";
else cmdiscot=strip(upcase(MYDCOTSP));
run;

 

*****Setting all data from raw datasets*********;

data allset;
set amyth1(keep=subject cmstdtc cmendtc cmdecod cmtrt cmpgdtc cmcat cmscat cmlnkid cmbest cmdisc cmdiscot cmconf cmconfot)
postammr1(keep=subject cmstdtc cmendtc cmdecod cmtrt cmpgdtc cmcat cmscat cmbest cmdisc cmdiscot)
cmnd1(keep=subject cmstdtc cmendtc cmtrt cmcat) cm1(keep=subject cmstdtc cmendtc cmdecod cmtrt cmcat cmdose)
lymph1(keep=subject cmstdtc cmendtc
cmtrt cmcat cmpresp cmoccur cmdose) leuk1(keep=subject cmstdtc cmendtc cmtrt cmcat cmpresp cmoccur cmdose);
usubjid="CRB-402-" || strip(substr(subject,4,3)) || "-" || strip(substr(subject,7));
cmdiscot=strip(compress(cmdiscot));
run;

 

data cmprod;
merge prodcm(in=x1) supptr(in=x2);
by usubjid cmseq;
cmdiscot=strip(compress(cmdiscot));
run;

 

 

**********Preparing for COMPARE***********;

proc sort data=cmprod out=prodcm1(keep=usubjid cmdecod cmstdtc cmendtc cmbest cmdisc cmdiscot cmconf cmconfot cmlnkid );
by usubjid cmdecod cmstdtc cmendtc descending cmlnkid cmbest cmdisc cmdiscot cmconf cmconfot ;
where cmdecod^="";
run;

 

proc sort data=allset out=allset1/*(keep=usubjid cmdecod cmstdtc cmendtc cmbest cmdisc cmdiscot cmconf cmconfot cmlnkid )*/;
by usubjid cmdecod cmstdtc cmendtc descending cmlnkid cmbest cmdisc cmdiscot cmconf cmconfot ;
where cmdecod^="";
run;

 

proc compare base=prodcm1 compare=allset1;
run;

Shmuel
Garnet | Level 18

Does CMDISCOT variable already exist in row data creating ALLSET1 : amyth, postammr ?

If posotive, can it be "UNKNOWN" ?

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 479 views
  • 0 likes
  • 2 in conversation