BookmarkSubscribeRSS Feed
Ronein
Onyx | Level 15

Hello

When I run following query the merge perform 100%

proc sql;
create table t5_Long_c as
select a.*,b.Theur_VAR_NAME,b.Theur_CATEGORY,b.Regression_Coefficient
from t5_Long_b as a
left join r_r.Derug_AP_Decoding_tbl  as b
on compress(upcase(a.Var_name))=compress(upcase(b.Var_name))  and a.Category=b.Category
;
quit;

However, when I run this code it doesn't success find matching rows.

It probably happend because in proc sql I used compress and upcase (for matching char var) and in data set (merge) I couldnt do it.

My question-

Is there a way to add compress and upcase  to merge via data set (merge)?

I know I can do it before (In each data set define a new var using compress and upcase )

But my question if it can be done directly in merge ??

proc sort data=r_r.Derug_AP_Decoding_tbl;
by Var_name CATEGORY;
Run;

proc sort data=t5_Long_b;
by Var_name CATEGORY;
Run;

data t5_Long_c;
merge t5_Long_b(in=a)  r_r.Derug_AP_Decoding_tbl;
by Var_name CATEGORY;
if a;
Run;
9 REPLIES 9
Kurt_Bremser
Super User
First of all, you should do such data cleaning immediately when the data is read into the SAS environment in the first place.

You can define DATA step or SQL views (DATA preferred) that do the conversion, and use these in the MERGE.
Quentin
Super User

No, you cannot do it in the MERGE step.  

 

The MERGE statement reads the key values from each dataset, then decides which record(s) to read into the PDV.   You cannot use an expression on the BY statement.

 

Edit: 

I was going to write that it's not possible to modify the key value before it is read by the  BY statement, because the data would need to be read into the PDV first.   But the WHERE statement is happy to modify key values before they are read into the PDV, so you can do stuff like: where upcase(category)='FOO'.  So theoretically the BY statement could allow you to manipulate the key values before they are compared.  But I don't think we're likely to see that added as a new feature.

yabwon
Amethyst | Level 16

Just for fun. In some cases you can 😉

 


PROC FCMP outlib=WORK.F.P;
  function UPCASECOMPRESS(s$)$32767;
    return(UPCASE(COMPRESS(s)));
  endfunc;
QUIT;

options append=(cmplib=WORK.F);

proc format;
value $ UPCASECOMPRESS
other=[UPCASECOMPRESS()]
;
run;

data have1;
  infile cards dlm=",";
  input Var_name $ CATEGORY $;
  data1+1;
cards;
 A b ,x
BC ,Y
 ab, X
aB ,X
 bc, y
b C, y
;
run;
title "vanila have1";
proc print data=have1;
run;

data have2;
  infile cards dlm=",";
  input Var_name $ CATEGORY $;
  data2+10;
cards;
ab, x
 Bc, Y
C D, Z
;
run;
title "vanila have2";
proc print data=have2;
run;


proc datasets nolist lib=work;
modify have1;
  format Var_name CATEGORY $UPCASECOMPRESS.;
run;
modify have2;
  format Var_name CATEGORY $UPCASECOMPRESS.;
run;
quit;

proc sort data=have1 sortseq=linguistic(locale=PL_PL); /* sorts: AaBbCcDd... */
by Var_name CATEGORY;
run;
title "sorted and formatted have1";
proc print data=have1;
run;

proc sort data=have2 sortseq=linguistic(locale=PL_PL);
by Var_name CATEGORY;
run;
title "sorted and formatted have2";
proc print data=have2;
run;

data WANT;
merge have1 have2;
by Var_name CATEGORY GROUPFORMAT;
run;
title "want";
proc print data=want;
run;


Output:

yabwon_0-1768408931905.png

 

 

Bart

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation



Quentin
Super User

Amazing @yabwon , so creative, as always!

yabwon
Amethyst | Level 16

But as I wrote, Just for fun.

 

In fact the sorting is very reason why implementing request of adding functions to merging is not "doable" in general case. Sorted data (ready for MERGE) after applying a function can be not MERGEble anymore.

 

In fact my example is breaks for that reason,

It's data dependent, so when you change data to:

data have1;
  infile cards dlm=",";
  input Var_name $ CATEGORY $;
  data1+1;
cards;
A b ,x
 BC ,Y
ab, X
aB ,X
 bc, x
b C, y
 D e, X
D e, x
;
run;
title "vanila have1";
proc print data=have1;
run;

data have2;
  infile cards dlm=",";
  input Var_name $ CATEGORY $;
  data2+10;
cards;
ab, x
 Bc, Y
C D, Z
;
run;
title "vanila have2";
proc print data=have2;
run;

And you run the code (from previous post), result is:

yabwon_0-1768419761920.png

While it should be:

yabwon_1-1768419783909.png

 

I think, it can't be safely used for more than one merging variable.

%let by=Var_name;
 
PROC FCMP outlib=WORK.F.P;
  function UPCASECOMPRESS(s$)$32767;
    return(UPCASE(COMPRESS(s)));
  endfunc;
QUIT;

options append=(cmplib=WORK.F);

proc format;
value $ UPCASECOMPRESS
other=[UPCASECOMPRESS()]
;
run;

data have1;
  infile cards dlm=",";
  input Var_name $ CATEGORY $;
  data1+1;
cards;
A b ,x
 BC ,Y
ab, X
aB ,X
 bc, x
b C, y
 D e, X
D e, x
;
run;
title "vanila have1";
proc print data=have1;
run;

data have2;
  infile cards dlm=",";
  input Var_name $ CATEGORY $;
  data2+10;
cards;
ab, x
 Bc, Y
C D, Z
;
run;
title "vanila have2";
proc print data=have2;
run;




proc sort data=have1 sortseq=linguistic(locale=PL_PL);
by &by.;
run;
title "sorted and formatted have1";
proc print data=have1;
run;

proc sort data=have2 sortseq=linguistic(locale=PL_PL);
by &by.;
run;
title "sorted and formatted have2";
proc print data=have2;
run;



proc datasets nolist lib=work;
modify have1;
  format Var_name CATEGORY $UPCASECOMPRESS.;
run;
modify have2;
  format Var_name CATEGORY $UPCASECOMPRESS.;
run;
quit;





data WANT;
merge have1(keep=&by. data:) have2(keep=&by. data:);
by &by.  GROUPFORMAT;
run;
title "want1";
proc print data=want;
run;

data have1b;
set have1;
format Var_name CATEGORY;
Var_name=UPCASE(COMPRESS(Var_name));
CATEGORY = UPCASE(COMPRESS(CATEGORY));
run;

data have2b;
set have2;
format Var_name CATEGORY;
Var_name=UPCASE(COMPRESS(Var_name));
CATEGORY = UPCASE(COMPRESS(CATEGORY));
run;

proc sort data=have1b sortseq=linguistic(locale=PL_PL);
by &by.;
run;
proc sort data=have2b sortseq=linguistic(locale=PL_PL);
by &by.;
run;

data WANT2;
merge have1b(keep=&by. data:) have2b(keep=&by. data:);
by &by.;
run;
title "want expected";
proc print data=want2;
run;

yabwon_2-1768420265448.png

 

Now I'm looking how to break it even for 1 variable.

 

Bart

 

 

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation



yabwon
Amethyst | Level 16

OK, so it's even more limited.

I didn't notice that, because I was reading dataliens and values were adjusted left.

 

If there are leading spaces the order is different, e.g. "<space>B" is sorted before "A". 

 

Further more, "A<space>C" is sorted before "AB" and "AC". So ordering of "A<space>C", "AB", and "AC" won't give proper result since upcase+compressed is: "AB", "AC", "AC".

 

So in fact if there are spaces inside they break it too.

 

 

 

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation



Quentin
Super User

@yabwon wrote:

In fact the sorting is very reason why implementing request of adding functions to merging is not "doable" in general case. Sorted data (ready for MERGE) after applying a function can be not MERGEble anymore.

Ahh, that makes sense.  Definitely not implementable in MERGE. Thanks again.

Tom
Super User Tom
Super User

Looks like you are trying to standardize the variable names so they will match.

You might want to add a new variable to each dataset with the standardized variable name.

uname=upcase(compress(var_name));

Then you can sort and/or merge by that new variable but still have the mixed case value for making pretty printouts.

 

ballardw
Super User

You do no mention how many unique combinations of the variables Var_name and Category might have in the two data sets. MERGE behaves quite differently than SQL in the presence of more than one observation with the same values.

 

In general I refrain from calling any SQL join a "merge" operation because I learned SAS data step code before any SQL and the data step Merge often behaves differently than any of the joins.

sas-innovate-2026-white.png



April 27 – 30 | Gaylord Texan | Grapevine, Texas

Registration is open

Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and save with the early bird rate—just $795!

Register now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 9 replies
  • 327 views
  • 2 likes
  • 6 in conversation