BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
kmardinian
Quartz | Level 8

Hi, I am currently working with a dataset that isn't the most user friendly and I am stuck on this one issue.

 

My dataset looks like the following:

 

ID#       1stMutation   2ndMutation

2345      KRAS             TP53

2345      EGFR             KRAS

2345      NRAS            TSC1

4455     TSC2              TP53

4455     EGFR              KRAS

4455     KRAS              ARID1A

4455     ATC                 KRAS

 

I'd like to find a way to find out how many patients have both a KRAS mutation in the 1stMutation and the 2ndMutation. I'd like a way where my results would show that 2 patients have matching KRAS mutations. Part 2 would be to then have results that show there are 3 total matching KRAS mutations.

 

Is that possible in SAS? Thank you so much for the help!

1 ACCEPTED SOLUTION

Accepted Solutions
Astounding
PROC Star

Once you subset the observations, the problem becomes easier.  For example, assuming your data set is sorted by ID:

 

data want;

do until (last.ID);

   set have;

   by ID;

   if mutation1='KRAS' then flag1=1;

   if mutation2='KRAS' then flag2=1;

end;

do until (last.ID);

   set have;

   by ID;

   if flag1=1 and flag2=1 and (mutation1='KRAS' or mutation2='KRAS') then output;

end;

drop flag1 flag2;

run;

 

That gives you the observations of interest, for further processing.

 

Optionally, you may wish to remove duplicate mutations for the same patient:

 

proc sort data=want NODUPKEY;

  by ID mutation1 mutation2;

run;

 

Finally, get the answers:

 

proc sql data=want;

   select count(distinct ID) from want;

quit;

proc freq data=want;

   tables mutation1 * mutation2 / missing list;

run;

View solution in original post

11 REPLIES 11
kiranv_
Rhodochrosite | Level 12

how does your final output look like.

kmardinian
Quartz | Level 8

What do you mean by final output? I'm new to SAS, so slowly trying to gain a grasp on it, but at the moment I have only been pulling up frequency tables to understand the data

novinosrin
Tourmaline | Level 20

would this help? @kiranv_ is right. We wanna know your result structure

 

data have;
input (ID IstMutation IIndMutation) ($);
datalines;
2345 KRAS TP53
2345 EGFR KRAS
2345 NRAS TSC1
4455 TSC2 TP53
4455 EGFR KRAS
4455 KRAS ARID1A
4455 ATC KRAS
;

data want;
set have;
if _n_=1 then do;
if 0 then set have;
declare hash h(dataset:'have', multidata:'y' );
h.defineKey('id','IIndMutation');
h.defineData(all:'y');
h.defineDone();
end;

set have;
by id;
if h.check(key:id, key:IstMutation)=0 then count+1;
if last.id then do;tot_count=count+1;output;end;
run;

kmardinian
Quartz | Level 8

Would it be possible for my results to just list the ID numbers that have matched KRAS mutations?

 

Result:

ID      

2345 
4455

novinosrin
Tourmaline | Level 20

Forget the hash, do a simple sql self join or a subquery and apply where condition equality operator. Tons of samples in google. Sorry, i am too lazy. My apologies

Astounding
PROC Star

Questions about Part 2 ... 

 

Are the counts limited to patients with KRAS in both Mutation 1 and Mutation 2, or should the counts be expanded to include patients that have just one KRAS instance?

 

If the exact same mutation occurs twice for the same patient, should it be counted once or twice?

 

You can easily get a report with the program below.  I just want to make sure it's the right report (hence the questions):

 

proc freq data=have;

tables mutation1 * mutation2 / missing list;

where mutation1='KRAS' or mutation2='KRAS';

run;

kmardinian
Quartz | Level 8

I'd like the counts to be limited to only patients who have both KRAS mutations in both Mutation 1 and Mutation 2

 

I tried this, but I keep getting the following error:

Thank you for all the help!

 

proc sql;
select a.*,
from KRAS a, KRAS b
where a.mutation1=b.mutation2 and
a.mutation2=b.mutation1;
quit;

 

LOG:

 
67
68 proc sql;
69 select a.*,
70 from KRAS a, KRAS b
____
79
ERROR 79-322: Expecting a FROM.
 
71 where a.mutation1=b.mutation2 and
72 a.mutation2=b.mutation1;
NOTE: PROC SQL set option NOEXEC and will continue to check the syntax of statements.
73 quit;
NOTE: The SAS System stopped processing this step because of errors.
NOTE: PROCEDURE SQL used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
 
74
75
76 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
89

 

kiranv_
Rhodochrosite | Level 12

remove comma 

proc sql;
select a.*,
from KRAS a, KRAS b

 

and change it to 

 

proc sql;
select a.*
from KRAS a, KRAS b

 

this proc sql link may be helpful to you

 

http://support.sas.com/documentation/cdl/en/proc/61895/HTML/default/viewer.htm#a000146906.htm

 

Astounding
PROC Star

Once you subset the observations, the problem becomes easier.  For example, assuming your data set is sorted by ID:

 

data want;

do until (last.ID);

   set have;

   by ID;

   if mutation1='KRAS' then flag1=1;

   if mutation2='KRAS' then flag2=1;

end;

do until (last.ID);

   set have;

   by ID;

   if flag1=1 and flag2=1 and (mutation1='KRAS' or mutation2='KRAS') then output;

end;

drop flag1 flag2;

run;

 

That gives you the observations of interest, for further processing.

 

Optionally, you may wish to remove duplicate mutations for the same patient:

 

proc sort data=want NODUPKEY;

  by ID mutation1 mutation2;

run;

 

Finally, get the answers:

 

proc sql data=want;

   select count(distinct ID) from want;

quit;

proc freq data=want;

   tables mutation1 * mutation2 / missing list;

run;

kmardinian
Quartz | Level 8

Thank you! I will try and use the do-loop, for some reason the proc sql didn't work and gave me some of the duplicates matched up on the same line, but also got rid of some of the observations.

 

I will go ahead and play around with the do-loop and see if that works. Thank you for all the help!

kmardinian
Quartz | Level 8

The do-loop worked so well, thank you so much! I appreciate all the help!

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to connect to databases in SAS Viya

Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 11 replies
  • 4337 views
  • 0 likes
  • 4 in conversation