BookmarkSubscribeRSS Feed
anhl1206
Fluorite | Level 6

Hello,

 

I am using SAS version: 9.04.01M5P091317

I am writing a program to do a (coarse) grid search of a data set  to recover the underlying value of recombination fraction from a simulated data set. Data structure is 600 families with two parents and four kids per family. For each family, I want to compute the LOD score and then sum across families. 

 

variables:

pedid= family number; personno = person number within family, 1 = father, 2 = mother, 3-6=kids, parent/kid is dichotomous for whether a person is a parent or kid, tot_recomb = number of recombinants within a family, non_recomb = number of non recombinants within a family

 

The first error type message is: 

NOTE: Invalid argument to function LOG(0) at line 2 column 56. (x 100)

...

NOTE: Over 100 NOTES, additional NOTES suppressed.
NOTE: Invalid argument to function LOG(0) at line 2 column 56.
NOTE: Missing values were generated as a result of performing an operation on missing values.
Each place is given by: (Number of times) at (Line):(Column).
164610646 at 2:54 164610646 at 2:65
NOTE: Mathematical operations could not be performed at the following places. The results of the
operations have been set to missing values.
Each place is given by: (Number of times) at (Line):(Column).
164610646 at 2:56

It gets stuck in this so I force quit the program. I think I've gotten something repeating too many times but I can't quite figure out where the error is. I'd really appreciate any insight!

 

code:

 

%MACRO gridLOD(r= , rstep= , data= );

data start;
	set &data;
	keep pedid personno kid parent meioses tot_recomb non_recomb;
		by pedid personno ;
/* define families for LOD calc */

/* 600 families*/;

array LOD {600} LOD1-LOD600;

DO pedid = 1 to 600;

	DO WHILE (kid = 1);

/* r = recombination fraction starting value */;

   		DO r=0.00 TO 0.50 BY &rstep;

		DO i=1 to 600;

		LOD(i) = tot_recomb*(log(&r)) + non_recomb*(log(1-&r));
			IF LOD(i) = . THEN LOD(i) = 0;

		END;
		END;

	LODtot=0;
	LODtot=sum(of LOD1-LOD600);

	OUTPUT;

	END;
	END;

RUN;

PROC SORT data=start;
	BY LODtot descending;
RUN;

DATA MLE;
	SET start;
	keep r LODtot;
RUN;

PROC PRINT DATA=MLE (obs=10);
RUN;

%MEND gridLOD;

%gridLOD (r=0.00, rstep=0.10, data=pwd.recode);
run;
6 REPLIES 6
Astounding
PROC Star

A couple of things jump out here.

 

 

When &R is 0.00, can you actually compute the log of zero?

 

Your DO WHILE loop appears to be infinite.  How would running the loop change the value of KID?

anhl1206
Fluorite | Level 6

Thank you both for your replies.

 

I updated the equation to avoid the log argument.

 

I was trying to use DO WHILE to condition the calculations for only summing across the kids values. I see now that wasn't accomplishing what I meant to. I think this is the correct way to do that:

 

DO pedid = 1 to 600;
     if kid = 1 then do i = 1 to 600;

 @art297 I'm thinking that the first do i = 1 to 600 is the statement I need to keep. I was struggling to figure out where it was repeating unintentionally, and I think that's it.

 

In the initial array statement, I haven't quite figured out if I need the array size to be 600 (one per row) or 1 (one variable)?  Or is the array length the number of steps I include in the macro? I want the output from the macro to be one value per family per step of the macro.

 

Thanks again to you both!

 

 

 

art297
Opal | Level 21

Still not sure I understand what you're trying to do. With the array, you're outputting an extra 600 variables. If you only need one variable, then don't use an array. If you need 600 iterations to obtain the value you want, then move your output statement to after the end of your do loop.

 

And, yes from what you said, you don't need both do loops.

 

Art, CEO, AnalystFinder.com

 

art297
Opal | Level 21

In addition to what @Astounding has already mentioned:

 

1. Did you really intend to run the loops 600*600 times?

2. All of the new output records for a given record are going to have the same values. Is that what you intended?

 

Art, CEO, AnalystFinder.com

 

 

anhl1206
Fluorite | Level 6

I realized an error with my input data. What I would like to do is get a sum across rows of a value for each family (number of recombinants for all kids and number of non recombinants for all kids) resulting in one observation per family (pedid). I would like to take those values and input into the macro function to calculate a score per family. 

 

example data, would be for 600 families 

 

DATA fam;
  INPUT pedid kid tot_recom non_recom;
CARDS;
1 0 0 2
1 0 0 2
1 1 0 2
1 1 0 2
1 1 0 2
1 1 0 2
2 0 0 2
2 0 0 2
2 1 0 2
2 1 0 2
2 1 0 2
2 1 0 2
;
RUN;

For these example data, each family should get a score of 0 recombinants and 8 non recombinants.

 

I have written this code for this operation:

if kid = 1 then DO pedid = 1 - 600;

	famR = sum(tot_recom);

	famNR = sum(non_recom);

	end;

When I use this code, famR and famNR are all missing values. 

Any further suggestions? Thanks again.

 

art297
Opal | Level 21

Sure sounds like you are trying to accomplish something like:

proc summary data=fam (where=(kid eq 1));
  var tot_recom non_recom;
  by pedid;
  output out=want (drop=_:) sum=;
run;

Art, CEO, AnalystFinder.com

 

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 728 views
  • 1 like
  • 3 in conversation