SAS Data Integration Studio, DataFlux Data Management Studio, SAS/ACCESS, SAS Data Loader for Hadoop and others

Finding matching observations among a variable

Accepted Solution Solved
Reply
Contributor
Posts: 48
Accepted Solution

Finding matching observations among a variable

Hi, I am currently working with a dataset that isn't the most user friendly and I am stuck on this one issue.

 

My dataset looks like the following:

 

ID#       1stMutation   2ndMutation

2345      KRAS             TP53

2345      EGFR             KRAS

2345      NRAS            TSC1

4455     TSC2              TP53

4455     EGFR              KRAS

4455     KRAS              ARID1A

4455     ATC                 KRAS

 

I'd like to find a way to find out how many patients have both a KRAS mutation in the 1stMutation and the 2ndMutation. I'd like a way where my results would show that 2 patients have matching KRAS mutations. Part 2 would be to then have results that show there are 3 total matching KRAS mutations.

 

Is that possible in SAS? Thank you so much for the help!


Accepted Solutions
Solution
‎01-08-2018 03:29 PM
Super User
Posts: 6,622

Re: Finding matching observations among a variable

Posted in reply to kmardinian

Once you subset the observations, the problem becomes easier.  For example, assuming your data set is sorted by ID:

 

data want;

do until (last.ID);

   set have;

   by ID;

   if mutation1='KRAS' then flag1=1;

   if mutation2='KRAS' then flag2=1;

end;

do until (last.ID);

   set have;

   by ID;

   if flag1=1 and flag2=1 and (mutation1='KRAS' or mutation2='KRAS') then output;

end;

drop flag1 flag2;

run;

 

That gives you the observations of interest, for further processing.

 

Optionally, you may wish to remove duplicate mutations for the same patient:

 

proc sort data=want NODUPKEY;

  by ID mutation1 mutation2;

run;

 

Finally, get the answers:

 

proc sql data=want;

   select count(distinct ID) from want;

quit;

proc freq data=want;

   tables mutation1 * mutation2 / missing list;

run;

View solution in original post


All Replies
PROC Star
Posts: 503

Re: Finding matching observations among a variable

Posted in reply to kmardinian

how does your final output look like.

Contributor
Posts: 48

Re: Finding matching observations among a variable

What do you mean by final output? I'm new to SAS, so slowly trying to gain a grasp on it, but at the moment I have only been pulling up frequency tables to understand the data

PROC Star
Posts: 1,547

Re: Finding matching observations among a variable

Posted in reply to kmardinian

would this help? @kiranv_ is right. We wanna know your result structure

 

data have;
input (ID IstMutation IIndMutation) ($);
datalines;
2345 KRAS TP53
2345 EGFR KRAS
2345 NRAS TSC1
4455 TSC2 TP53
4455 EGFR KRAS
4455 KRAS ARID1A
4455 ATC KRAS
;

data want;
set have;
if _n_=1 then do;
if 0 then set have;
declare hash h(dataset:'have', multidata:'y' );
h.defineKey('id','IIndMutation');
h.defineData(all:'y');
h.defineDone();
end;

set have;
by id;
if h.check(key:id, key:IstMutation)=0 then count+1;
if last.id then do;tot_count=count+1;output;end;
run;

Contributor
Posts: 48

Re: Finding matching observations among a variable

Posted in reply to novinosrin

Would it be possible for my results to just list the ID numbers that have matched KRAS mutations?

 

Result:

ID      

2345 
4455

PROC Star
Posts: 1,547

Re: Finding matching observations among a variable

[ Edited ]
Posted in reply to kmardinian

Forget the hash, do a simple sql self join or a subquery and apply where condition equality operator. Tons of samples in google. Sorry, i am too lazy. My apologies

Super User
Posts: 6,622

Re: Finding matching observations among a variable

Posted in reply to kmardinian

Questions about Part 2 ... 

 

Are the counts limited to patients with KRAS in both Mutation 1 and Mutation 2, or should the counts be expanded to include patients that have just one KRAS instance?

 

If the exact same mutation occurs twice for the same patient, should it be counted once or twice?

 

You can easily get a report with the program below.  I just want to make sure it's the right report (hence the questions):

 

proc freq data=have;

tables mutation1 * mutation2 / missing list;

where mutation1='KRAS' or mutation2='KRAS';

run;

Contributor
Posts: 48

Re: Finding matching observations among a variable

Posted in reply to Astounding

I'd like the counts to be limited to only patients who have both KRAS mutations in both Mutation 1 and Mutation 2

 

I tried this, but I keep getting the following error:

Thank you for all the help!

 

proc sql;
select a.*,
from KRAS a, KRAS b
where a.mutation1=b.mutation2 and
a.mutation2=b.mutation1;
quit;

 

LOG:

 
67
68 proc sql;
69 select a.*,
70 from KRAS a, KRAS b
____
79
ERROR 79-322: Expecting a FROM.
 
71 where a.mutation1=b.mutation2 and
72 a.mutation2=b.mutation1;
NOTE: PROC SQL set option NOEXEC and will continue to check the syntax of statements.
73 quit;
NOTE: The SAS System stopped processing this step because of errors.
NOTE: PROCEDURE SQL used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
 
74
75
76 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
89

 

PROC Star
Posts: 503

Re: Finding matching observations among a variable

Posted in reply to kmardinian

remove comma 

proc sql;
select a.*,
from KRAS a, KRAS b

 

and change it to 

 

proc sql;
select a.*
from KRAS a, KRAS b

 

this proc sql link may be helpful to you

 

http://support.sas.com/documentation/cdl/en/proc/61895/HTML/default/viewer.htm#a000146906.htm

 

Solution
‎01-08-2018 03:29 PM
Super User
Posts: 6,622

Re: Finding matching observations among a variable

Posted in reply to kmardinian

Once you subset the observations, the problem becomes easier.  For example, assuming your data set is sorted by ID:

 

data want;

do until (last.ID);

   set have;

   by ID;

   if mutation1='KRAS' then flag1=1;

   if mutation2='KRAS' then flag2=1;

end;

do until (last.ID);

   set have;

   by ID;

   if flag1=1 and flag2=1 and (mutation1='KRAS' or mutation2='KRAS') then output;

end;

drop flag1 flag2;

run;

 

That gives you the observations of interest, for further processing.

 

Optionally, you may wish to remove duplicate mutations for the same patient:

 

proc sort data=want NODUPKEY;

  by ID mutation1 mutation2;

run;

 

Finally, get the answers:

 

proc sql data=want;

   select count(distinct ID) from want;

quit;

proc freq data=want;

   tables mutation1 * mutation2 / missing list;

run;

Contributor
Posts: 48

Re: Finding matching observations among a variable

Posted in reply to Astounding

Thank you! I will try and use the do-loop, for some reason the proc sql didn't work and gave me some of the duplicates matched up on the same line, but also got rid of some of the observations.

 

I will go ahead and play around with the do-loop and see if that works. Thank you for all the help!

Contributor
Posts: 48

Re: Finding matching observations among a variable

Posted in reply to Astounding

The do-loop worked so well, thank you so much! I appreciate all the help!

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 11 replies
  • 315 views
  • 0 likes
  • 4 in conversation