Hello,
I am using SAS version: 9.04.01M5P091317
I am writing a program to do a (coarse) grid search of a data set to recover the underlying value of recombination fraction from a simulated data set. Data structure is 600 families with two parents and four kids per family. For each family, I want to compute the LOD score and then sum across families.
variables:
pedid= family number; personno = person number within family, 1 = father, 2 = mother, 3-6=kids, parent/kid is dichotomous for whether a person is a parent or kid, tot_recomb = number of recombinants within a family, non_recomb = number of non recombinants within a family
The first error type message is:
NOTE: Invalid argument to function LOG(0) at line 2 column 56. (x 100)
...
NOTE: Over 100 NOTES, additional NOTES suppressed.
NOTE: Invalid argument to function LOG(0) at line 2 column 56.
NOTE: Missing values were generated as a result of performing an operation on missing values.
Each place is given by: (Number of times) at (Line):(Column).
164610646 at 2:54 164610646 at 2:65
NOTE: Mathematical operations could not be performed at the following places. The results of the
operations have been set to missing values.
Each place is given by: (Number of times) at (Line):(Column).
164610646 at 2:56
It gets stuck in this so I force quit the program. I think I've gotten something repeating too many times but I can't quite figure out where the error is. I'd really appreciate any insight!
code:
%MACRO gridLOD(r= , rstep= , data= );
data start;
set &data;
keep pedid personno kid parent meioses tot_recomb non_recomb;
by pedid personno ;
/* define families for LOD calc */
/* 600 families*/;
array LOD {600} LOD1-LOD600;
DO pedid = 1 to 600;
DO WHILE (kid = 1);
/* r = recombination fraction starting value */;
DO r=0.00 TO 0.50 BY &rstep;
DO i=1 to 600;
LOD(i) = tot_recomb*(log(&r)) + non_recomb*(log(1-&r));
IF LOD(i) = . THEN LOD(i) = 0;
END;
END;
LODtot=0;
LODtot=sum(of LOD1-LOD600);
OUTPUT;
END;
END;
RUN;
PROC SORT data=start;
BY LODtot descending;
RUN;
DATA MLE;
SET start;
keep r LODtot;
RUN;
PROC PRINT DATA=MLE (obs=10);
RUN;
%MEND gridLOD;
%gridLOD (r=0.00, rstep=0.10, data=pwd.recode);
run;
A couple of things jump out here.
When &R is 0.00, can you actually compute the log of zero?
Your DO WHILE loop appears to be infinite. How would running the loop change the value of KID?
Thank you both for your replies.
I updated the equation to avoid the log argument.
I was trying to use DO WHILE to condition the calculations for only summing across the kids values. I see now that wasn't accomplishing what I meant to. I think this is the correct way to do that:
DO pedid = 1 to 600;
if kid = 1 then do i = 1 to 600;
@art297 I'm thinking that the first do i = 1 to 600 is the statement I need to keep. I was struggling to figure out where it was repeating unintentionally, and I think that's it.
In the initial array statement, I haven't quite figured out if I need the array size to be 600 (one per row) or 1 (one variable)? Or is the array length the number of steps I include in the macro? I want the output from the macro to be one value per family per step of the macro.
Thanks again to you both!
Still not sure I understand what you're trying to do. With the array, you're outputting an extra 600 variables. If you only need one variable, then don't use an array. If you need 600 iterations to obtain the value you want, then move your output statement to after the end of your do loop.
And, yes from what you said, you don't need both do loops.
Art, CEO, AnalystFinder.com
In addition to what @Astounding has already mentioned:
1. Did you really intend to run the loops 600*600 times?
2. All of the new output records for a given record are going to have the same values. Is that what you intended?
Art, CEO, AnalystFinder.com
I realized an error with my input data. What I would like to do is get a sum across rows of a value for each family (number of recombinants for all kids and number of non recombinants for all kids) resulting in one observation per family (pedid). I would like to take those values and input into the macro function to calculate a score per family.
example data, would be for 600 families
DATA fam;
INPUT pedid kid tot_recom non_recom;
CARDS;
1 0 0 2
1 0 0 2
1 1 0 2
1 1 0 2
1 1 0 2
1 1 0 2
2 0 0 2
2 0 0 2
2 1 0 2
2 1 0 2
2 1 0 2
2 1 0 2
;
RUN;
For these example data, each family should get a score of 0 recombinants and 8 non recombinants.
I have written this code for this operation:
if kid = 1 then DO pedid = 1 - 600;
famR = sum(tot_recom);
famNR = sum(non_recom);
end;
When I use this code, famR and famNR are all missing values.
Any further suggestions? Thanks again.
Sure sounds like you are trying to accomplish something like:
proc summary data=fam (where=(kid eq 1));
var tot_recom non_recom;
by pedid;
output out=want (drop=_:) sum=;
run;
Art, CEO, AnalystFinder.com
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.