Hi Sas Community,
I haven't come across this before, but I have multiple observations per person, and then for observations that are expected to be repeated (like sex) they are only listed once per person and then the rest are listed as missing. A way simplified version of the data is like this:
Person Sex
1 2
1 .
1 .
2 1
2 .
2 .
I need those variables to not be missing. I can be basically 100% certain that that is exactly what that person's sex will be at the next observation. Sooooo I've read about single or deterministic regression imputation, and it sounds like that'd be a good way to go. I haven't found any code anywhere for this though? Does anyone have any code to offer to help me go about this?
Thanks,
Estelle
Imputation is about replacing unknown values with probable/representative values. All you need is to replace missing information in a deterministic way.
One way to do this is to invoque SQL automatic remerging and use the fact that SAS missing values are inferior to non-missing values:
data have;
input Person Sex :$1. Age x;
datalines;
1 2 30 0.1
1 . . 0.2
1 . . 0.3
2 1 32 1.1
2 . . 1.2
2 . . 1.3
;
Proc sql;
create table want as
select
Person,
max(sex) as sex,
max(age) as age,
x
from have
group by person;
select * from want;
quit;
I'm a new to SQL so that's maybe a problem here. I'm submitting this:
Proc sql;
create table imputed as
select
nomen,
max (sexeps) as sexeps,
max (tage) as tage,
from mergeinca2
group by nomen;
quit;
And it tells me in the log "expecting a "FROM" (and underlines "from mergeinca2"). Mergeinca2 is the relevant dataset.
Remove the comma after tage.
Remove the comma after tage.
This procedure actually omits all repeated observations (because the repeated observations are from other variables). I think I need to "keep" the other variables in my dataset while running proc sql perhaps?
I am submitting this code and I think maybe my "group by" line is now in the wrong spot?
Proc sql;
create table imputed as
select nomen,
max (sexeps) as sexeps2,
max (tage) as tage2
from mergeinca2 (keep= sexeps codgr tage qte_nette nomen qte_brute nbjcc);
group by nomen;
quit;
I get the message "Statement is not valid or used out of proper order" with group by nomen underlined in the log. Also the table has all sex set to 2 and all age categories set to 8 for all 541,526. Do I have to put that group by variable somewhere else if I want to keep other variables from the mergeinca2 dataset?
Thanks!
Estelle
Try this:
1. Remove extra semicolon
2. Add * to beginning of query
Proc sql;
create table imputed as
select *, nomen,
max (sexeps) as sexeps2,
max (tage) as tage2
from mergeinca2
group by nomen;
quit;
You are very close,
Proc sql;
create table imputed as
select
nomen,
codgr, qte_nette, qte_brute, nbjcc,
max (sexeps) as sexeps,
max (tage) as tage
from mergeinca2
group by nomen;
quit;
data have; input Person Sex; cards; 1 2 1 . 1 . 2 1 2 . 2 . ; run; data want; merge have(drop=sex) have(keep=person sex where=(sex is not missing)) ; by person; run;
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9.
Lock in the best rate now before the price increases on April 1.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.