Please note that this condition if first.pid then _iorc_ = ifn (exercise, 0, 1) ; will not work as specified in the initial post if the variable exercise takes negative values. To work around that the condition should be if first.pid then _iorc_ = ifn (exercise gt 0, 0, 1) ;
... View more
@ChrisNZ :
With your approach, I'd expect POINT= perform very well at any rate since your step never uses it to read the file out of order. It would be particularly true with enough memory to SASFILE it.
What I meant by "exclude rather than include" is something in this vein:
data have ;
input id gender income ;
cards;
1 1 5000
1 1 .
1 1 7000
1 1 .
2 2 3000
2 2 5000
2 2 1000
3 1 .
3 1 900000
3 1 12345
;
run ;
data rid (keep = rid_:) ;
do until (last.id) ;
set have (keep = id income) curobs = q ;
by id ;
if first.id then rid_from = q ;
if missing (income) then _missflag = 1 ;
end ;
if _missflag ;
rid_to = q ;
run ;
data have ;
set rid ;
do rid = rid_from to rid_to ;
modify have point = rid ;
remove ;
end ;
run ;
I'd expect it to perform faster than pretty much anything else against HAVE with a small number of short "missing" ID groups relative to the overall number of ID groups.
Kind regards
Paul D.
... View more
This is called Reference parameterization. The last level from each class effect is fixed at zero and the other levels are estimated as offsets from that reference. So zero really is the estimate for that last class level.
You would probably find LSMEANS more informative. Add
LSMEANS gender;
LSMEANS education_level;
say, to your proc glm code and see if those results are more interesting to you. There are also plenty of options that can be added to the LSMEANS statement.
... View more
I have multiple class variables. The noint gave me all the estimates for the first one but the others are still missing the estimates for the last output.
... View more
A match in the datasets can still contain missing values. You need to get a clear picture of your data first.
If your data step merge completes without ERROR/WARNING/other suspicious NOTEs, but you still have lots of missings, you have to investigate your source data.
PS it's simpler to use a subsetting if:
if a and b and c and d and e and f;
... View more