BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Niugg2010
Obsidian | Level 7

I have two gene sequences

(1)GAGCAAGCGCCATACTCCTGTGGAGAAGTCTGCCGTTACTGCCCTGTGGGGCAAGGTGAACGTGGA

(2)GAGCAAGCGCCATAGTCCTGTGGAGAAGTCTGCCGTTACTGCCCTGTGGGGCAAGGTGAACGTGGA

 

I want to mark their differences. 

Now I marked the differences with lowcase(char) (see my code below). My question is how I can mark the difference with Red color.

By the way, I appreciate if someone can optimize my code.

 

Thanks.

 

 

 

***Code Start*******************************************

data a;
length f1 $ 200;
input f1;
datalines;
GAGCAAGCGCCATACTCCTGTGGAGAAGTCTGCCGTTACTGCCCTGTGGGGCAAGGTGAACGTGGA
GAGCAAGCGCCATAGTCCTGTGGAGAAGTCTGCCGTTACTGCCCTGTGGGGCAAGGTGAACGTGGA
;

data b;
set a;
retain base len_1;
if _n_=1 then
do;
base=f1;
len_1=length(base);
end;
f2=f1;
len_2=length(f2);
x=min(len_1, len_2);
do i=1 to x;
substr_1=substr(base,i,1);
substr_2=substr(f2,i,1);
if substr_1 ^=substr_2 then substr(f2,i,1)=lowcase(substr_2);
else;
end;
run;

 

proc print data=b ;
var f1 f2;
run;

 

***Code end***********************************

 

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
KachiM
Rhodochrosite | Level 12

Compare() function compares two strings. Returns left-most position of the byte which is not matching and 0(zero) when the two strings are same.  Since you have given only two strings which has a differeing byte at the 15-position and I am adding one more string to show that COMPARE() function returns 0.

 

data a;
length f1 $ 66;
input f1;
datalines;
GAGCAAGCGCCATACTCCTGTGGAGAAGTCTGCCGTTACTGCCCTGTGGGGCAAGGTGAACGTGGA
GAGCAAGCGCCATACTCCTGTGGAGAAGTCTGCCGTTACTGCCCTGTGGGGCAAGGTGAACGTGGA
GAGCAAGCGCCATAGTCCTGTGGAGAAGTCTGCCGTTACTGCCCTGTGGGGCAAGGTGAACGTGGA
;
run;

data _null_;
   retain old;

   set a;
   if _n_ = 1 then old = f1;
   else do;
      dif = compare(f1, old);
      put dif = ;
      if dif = 0 then put 'No Difference';
      if dif ^= 0 then substr(f1, dif, 1) = lowcase(substr(f1, dif, 1));
      put f1 = ;
   end;
run;

 

View solution in original post

12 REPLIES 12
ballardw
Super User

Color implies a method of display that includes such things. So the question to you is what output format are you looking for as addressing individual characters will likley require different methods. Do you want HTML, RTF, PDF or something else for output.

Niugg2010
Obsidian | Level 7

RTF or PDF are both fine for me.

 

Do you mean to use proc template to control the output? In my mind, Proc template can only define to CELLVALUE level, not the special character in each CELL.

 

 

 

ballardw
Super User

@Niugg2010 wrote:

RTF or PDF are both fine for me.

 

Do you mean to use proc template to control the output? In my mind, Proc template can only define to CELLVALUE level, not the special character in each CELL.

 

 

 


Thats why the target definition is important. The only way I see likely is to build a string with inbeded markup codes. A pseudo code approach is going to yield a string that lools something like the following where {font color: color value} is replaced by the raw codes of the markup destination.

{font color: default}ABCABCABC{font color:red}abc{font color:default}BDABDABDA

using letters intentionally that do not resemble your data in any detail.

 

ESCAPECHAR and the RAW function will let you insert the control strings once the values needed are determined.

 

I would recommend hard coding a couple of examples to get the feel before trying to code conditionally based on the case of the letters. The latter shouldn't be to difficult actually once the correct code is determined.

 

Here's a real brief example of inserting codes to print, change the RTF filepath to something you can use:

ods escapchar="^";
data junk;

x = 'Example of ^{raw \cf12 RAW} function';
y ="Example ^{style [foreground=red] of Super, Alpha ^{super ^{unicode ALPHA}
       ^{style [foreground=green] Nested}} Formatting} and Scoping";
run;

ods rtf file='D:\data\junk.rtf' style=meadow;
proc print data=junk;
run;
ods rtf close;
Niugg2010
Obsidian | Level 7

Thanks. I tried. It is powerful. However, I just listed two sequences above. Actually I have over 50 sequences to mark. Do you have any method to add conditions to deal with the data? Thanks

Ksharp
Super User
It would be very convenient for SAS/IML , if you could post the output you want.
Or Post it at IML forum .

Ksharp
Super User
data a;
length f1 $ 200;
input f1;
datalines;
GAGCAAGCGCCATACTCCTGTGGAGAAGTCTGCCGTTACTGCCCTGTGGGGCAAGGTGAACGTGGA
GAGCAAGCGCCATAGTCCTGTGGAGAAGTCTGCCGTTACTGCCCTGTGGGGCAAGGTGAACGTGGA
;
run;
proc format;
value fmt
 1='red';
run;

proc iml;
use a nobs nobs;
read all var {f1};
close;
n=length(f1)[1];
temp=j(nobs,n,' ');
do i=1 to nobs;
 temp[i,]=substr(f1[i],1:n,1);
end;
want=(countunique(temp,'col')>1);
create want from want;
append from want;
close;
run;
proc report data=want nowd;
define col:/style={backgroundcolor=fmt.};
run;
Niugg2010
Obsidian | Level 7

Cool. Thanks. Learn a lot. I have never used PROC IML.

KachiM
Rhodochrosite | Level 12

Use of SUBSTR() function can be replaced by the new function, COMPARE(). It will compare both the strings and will return the first leftmost POSITION where they differ. If you need to search more than one character-position, then you could compare() to the right of the position returned. The benefit is that you can skip those strings which are same.

Niugg2010
Obsidian | Level 7

I an not familar to compare() function. Can you help me to optimize my code with compare()? Thanks.

KachiM
Rhodochrosite | Level 12

Compare() function compares two strings. Returns left-most position of the byte which is not matching and 0(zero) when the two strings are same.  Since you have given only two strings which has a differeing byte at the 15-position and I am adding one more string to show that COMPARE() function returns 0.

 

data a;
length f1 $ 66;
input f1;
datalines;
GAGCAAGCGCCATACTCCTGTGGAGAAGTCTGCCGTTACTGCCCTGTGGGGCAAGGTGAACGTGGA
GAGCAAGCGCCATACTCCTGTGGAGAAGTCTGCCGTTACTGCCCTGTGGGGCAAGGTGAACGTGGA
GAGCAAGCGCCATAGTCCTGTGGAGAAGTCTGCCGTTACTGCCCTGTGGGGCAAGGTGAACGTGGA
;
run;

data _null_;
   retain old;

   set a;
   if _n_ = 1 then old = f1;
   else do;
      dif = compare(f1, old);
      put dif = ;
      if dif = 0 then put 'No Difference';
      if dif ^= 0 then substr(f1, dif, 1) = lowcase(substr(f1, dif, 1));
      put f1 = ;
   end;
run;

 

Ksharp
Super User

OK.If you really want data step.

 

data a;
length f1 $ 200;
input f1;
datalines;
GAGCAAGCGCCATACTCCTGTGGAGAAGTCTGCCGTTACTGCCCTGTGGGGCAAGGTGAACGTGGA
GAGCAAGCGCCATAGTCCTGTGGAGAAGTCTGCCGTTACTGCCCTGTGGGGCAAGGTGAACGTGGA
GAGCAAGCGCCATAGTCCTGTGGAGAAGTCTGCCGTTACTGCCCTGTGGGGCAAAGTGAACGTGGA
AAGCAAGCGCCATAGTCCTGTGGAGSAGTCTGCCGTTACTGCCCTGTGGGGCAAGGTGAACGTGGA
;
run;
proc format;
value fmt
 1='red';
run;
data _null_;
 set a;
 call symputx('n',length(f1));
 stop;
run;
data temp;
 set a;
 array x{&n} $ 1;
 do i=1 to &n;
  x{i}=char(f1,i);
 end;
 keep x:;
run;
proc transpose data=temp(obs=0) out=vnames;
var _all_;
run;
data _null_;
 set vnames end=last;
 if _n_=1 then call execute('proc sql;create table flag as select ');
 call execute(cat('count(distinct ',_name_,') as ',_name_));
 if last then call execute ('from temp;quit;');
  else call execute(',');
run;
proc transpose data=flag out=diff_temp;
var _all_;
run;
data diff_vname;
 set diff_temp;
 if col1 ne 1;
run;
data want;
if _n_=1 then do;
 if 0 then set diff_vname;
 declare hash h(dataset:'diff_vname');
 h.definekey('_name_');
 h.definedata('col1');
 h.definedone();
end;
call missing(of _all_);
 set vnames;
 rc=h.find();
run;
data _null_;
 set want end=last;
 if _n_=1 then call execute('proc report data=temp nowd;');
 call execute(cat('define ',_name_,'/display'));
 if not missing(col1) then call execute(' style={backgroundcolor=red}');
 call execute(';');
 if last then call execute('run;');
run;
Niugg2010
Obsidian | Level 7

Thanks. I got it.

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 12 replies
  • 1686 views
  • 2 likes
  • 4 in conversation