Hi all, quick question about populating missing values by group, but doing so systematically so that the pattern of data for the 'cases' is applied to the 'controls'. I've included a Have/Want example. The only thing different between this sample and what i want is that there are multiple controls per case (e.g., 6:1), but they are sorted so that once a new case occurs the pattern will repeat itself following the same logic. Thank you for any advice to help me figure it out. I think it needs a do loop in order to get the same pattern, but so far what i've been able to get is not correct. Nate
DATA Have;
INPUT ID $ Case Set Time ;
CARDS;
A 1 1 -3
A 1 1 -2
A 1 1 -1
A 1 1 0
A 1 1 1
A 1 1 2
A 1 1 3
B 0 1 .
B 0 1 .
B 0 1 .
B 0 1 .
B 0 1 .
B 0 1 .
B 0 1 .
C 1 2 -2
C 1 2 -1
C 1 2 0
C 1 2 1
C 1 2 2
C 1 2 3
C 1 2 4
D 0 2 .
D 0 2 .
D 0 2 .
D 0 2 .
D 0 2 .
D 0 2 .
D 0 2 .
;
DATA Want;
INPUT ID $ Case Set Time ;
CARDS;
A 1 1 -3
A 1 1 -2
A 1 1 -1
A 1 1 0
A 1 1 1
A 1 1 2
A 1 1 3
B 0 1 -3
B 0 1 -2
B 0 1 -1
B 0 1 0
B 0 1 1
B 0 1 2
B 0 1 3
C 1 2 -2
C 1 2 -1
C 1 2 0
C 1 2 1
C 1 2 2
C 1 2 3
C 1 2 4
D 0 2 -2
D 0 2 -1
D 0 2 0
D 0 2 1
D 0 2 2
D 0 2 3
D 0 2 4
;
DATA Have;
INPUT ID $ year Case Set pattern_var ;
CARDS;
A1 1 1 1 -3
A1 2 1 1 -2
A1 3 1 1 -1
B2 1 0 1 .
B2 2 0 1 .
B2 3 0 1 .
L5 1 0 1 .
L5 2 0 1 .
L5 3 0 1 .
C8 1 1 2 -2
C8 2 1 2 -1
C8 3 1 2 0
A4 1 0 2 .
A4 2 0 2 .
A4 3 0 2 .
D8 1 0 2 .
D8 2 0 2 .
D8 3 0 2 .
Z9 1 1 3 2
Z9 2 1 3 3
Z9 3 1 3 4
K8 1 0 3 .
K8 2 0 3 .
K8 3 0 3 .
R3 1 0 3 .
R3 2 0 3 .
R3 3 0 3 .
;
data want;
set have;
array x{99999} _temporary_;
if not missing(pattern_var) then x{year}=pattern_var;
else pattern_var=x{year};
run;
Hi @N8 Assuming i understand your requirement-
DATA Have;
INPUT ID $ Case Set Time ;
CARDS;
A 1 1 -3
A 1 1 -2
A 1 1 -1
A 1 1 0
A 1 1 1
A 1 1 2
A 1 1 3
B 0 1 .
B 0 1 .
B 0 1 .
B 0 1 .
B 0 1 .
B 0 1 .
B 0 1 .
C 1 2 -2
C 1 2 -1
C 1 2 0
C 1 2 1
C 1 2 2
C 1 2 3
C 1 2 4
D 0 2 .
D 0 2 .
D 0 2 .
D 0 2 .
D 0 2 .
D 0 2 .
D 0 2 .
;
data want ;
do _n=1 by 1 until(last.id) ;
set have ;
by id ;
if mod(_n_,2) then continue ;
array t(9999) _temporary_;
time=t(_n) ;
output ;
end;
call missing(of t(*)) ;
do _n=1 by 1 until(last.id) ;
set have ;
by id ;
if mod(_n_,2)=0 then continue ;
t(_n)=time ;
output ;
end;
drop _: ;
run ;
proc print noobs ;run ;
ID | Case | Set | Time |
---|---|---|---|
A | 1 | 1 | -3 |
A | 1 | 1 | -2 |
A | 1 | 1 | -1 |
A | 1 | 1 | 0 |
A | 1 | 1 | 1 |
A | 1 | 1 | 2 |
A | 1 | 1 | 3 |
B | 0 | 1 | -3 |
B | 0 | 1 | -2 |
B | 0 | 1 | -1 |
B | 0 | 1 | 0 |
B | 0 | 1 | 1 |
B | 0 | 1 | 2 |
B | 0 | 1 | 3 |
C | 1 | 2 | -2 |
C | 1 | 2 | -1 |
C | 1 | 2 | 0 |
C | 1 | 2 | 1 |
C | 1 | 2 | 2 |
C | 1 | 2 | 3 |
C | 1 | 2 | 4 |
D | 0 | 2 | -2 |
D | 0 | 2 | -1 |
D | 0 | 2 | 0 |
D | 0 | 2 | 1 |
D | 0 | 2 | 2 |
D | 0 | 2 | 3 |
D | 0 | 2 | 4 |
I guess you just need this after all?-
data want ;
do _n=1 by 1 until(last.id) ;
set have ;
by id;
array t(99) _temporary_;
if mod(_n_,2) then t(_n)=time;
else time=t(_n);
output;
end;
drop _: ;
run;
Thanks novinosrin - Sadly, no. I think one of the issue is that my actual ID variables aren't ordered consecutively - it's random alpha/numeric - so the primary variable that it has to match on is the 'setnumbr' (i.e., i want to populate all the missing setnumbr values with the values for the 'cases' of the same setnumbr), but they need to correspond to the same year + group ID that corresponds with the values for the 'cases'. I'll keep plugging away at it. I appreciate the effort! thank you!
Hi @N8 Okay never mind. Can you please post a more representative sample of what you have and the want with the YEAR variable and so forth. Also, please explain the logic with the modified sample. That can only help us avoid assumptions/guesses. Thanks!
No worries. I've been working with setting it up with a do loop, but i'm only able to get the first value to match.
This is a pretty good pared down replication of the source file. What I am trying to do is replicate the "pattern_var" so that all of the Case=0 values that belong to the same "set" as its corresponding "case" get the same corresponding value as the case has in the "pattern_var". Where I get hung up is that I need to get the code to cycle through the ID + year so that the controls get the same corresponding value as the case for each year. If I haven't exhausted your kindness... 🙂
DATA Have;
INPUT ID $ year Case Set pattern_var ;
CARDS;
A1 1 1 1 -3
A1 2 1 1 -2
A1 3 1 1 -1
B2 1 0 1 .
B2 2 0 1 .
B2 3 0 1 .
L5 1 0 1 .
L5 2 0 1 .
L5 3 0 1 .
C8 1 1 2 -2
C8 2 1 2 -1
C8 3 1 2 0
A4 1 0 2 .
A4 2 0 2 .
A4 3 0 2 .
D8 1 0 2 .
D8 2 0 2 .
D8 3 0 2 .
Z9 1 1 3 2
Z9 2 1 3 3
Z9 3 1 3 4
K8 1 0 3 .
K8 2 0 3 .
K8 3 0 3 .
R3 1 0 3 .
R3 2 0 3 .
R3 3 0 3 .
;
Hi again @N8 Thank you for posting the modified sample. My understanding is that for each group of a set, there are 2 distinct cases(1 and 0) and the records for each set begins with case=1 with a non missing pattern_var. I guess this pattern pretty much holds true across all sets beginning with records with case=1 with non missing pattern_var begins followed case=0 with missing pattern_var. The objective is to peek the corresponding pattern_var occurring in a certain year within case=1, retrieve and update the corresponding year's pattern_var for cases=0 within a set.
If the above is correct, the following should suffice I think-
DATA Have;
INPUT ID $ year Case Set pattern_var ;
CARDS;
A1 1 1 1 -3
A1 2 1 1 -2
A1 3 1 1 -1
B2 1 0 1 .
B2 2 0 1 .
B2 3 0 1 .
L5 1 0 1 .
L5 2 0 1 .
L5 3 0 1 .
C8 1 1 2 -2
C8 2 1 2 -1
C8 3 1 2 0
A4 1 0 2 .
A4 2 0 2 .
A4 3 0 2 .
D8 1 0 2 .
D8 2 0 2 .
D8 3 0 2 .
Z9 1 1 3 2
Z9 2 1 3 3
Z9 3 1 3 4
K8 1 0 3 .
K8 2 0 3 .
K8 3 0 3 .
R3 1 0 3 .
R3 2 0 3 .
R3 3 0 3 .
;
data want ;
if _n_=1 then do ;
dcl hash h() ;
h.definekey('year') ;
h.definedata('pattern_var') ;
h.definedone() ;
end ;
do _n_=h.clear() by 0 until(last.set) ;
do until(last.case) ;
set have ;
by set case notsorted ;
if case=1 then h.replace() ;
else h.find() ;
output ;
end ;
end ;
run ;
proc print noobs ;run ;
ID | year | Case | Set | pattern_var |
---|---|---|---|---|
A1 | 1 | 1 | 1 | -3 |
A1 | 2 | 1 | 1 | -2 |
A1 | 3 | 1 | 1 | -1 |
B2 | 1 | 0 | 1 | -3 |
B2 | 2 | 0 | 1 | -2 |
B2 | 3 | 0 | 1 | -1 |
L5 | 1 | 0 | 1 | -3 |
L5 | 2 | 0 | 1 | -2 |
L5 | 3 | 0 | 1 | -1 |
C8 | 1 | 1 | 2 | -2 |
C8 | 2 | 1 | 2 | -1 |
C8 | 3 | 1 | 2 | 0 |
A4 | 1 | 0 | 2 | -2 |
A4 | 2 | 0 | 2 | -1 |
A4 | 3 | 0 | 2 | 0 |
D8 | 1 | 0 | 2 | -2 |
D8 | 2 | 0 | 2 | -1 |
D8 | 3 | 0 | 2 | 0 |
Z9 | 1 | 1 | 3 | 2 |
Z9 | 2 | 1 | 3 | 3 |
Z9 | 3 | 1 | 3 | 4 |
K8 | 1 | 0 | 3 | 2 |
K8 | 2 | 0 | 3 | 3 |
K8 | 3 | 0 | 3 | 4 |
R3 | 1 | 0 | 3 | 2 |
R3 | 2 | 0 | 3 | 3 |
R3 | 3 | 0 | 3 | 4 |
Thank you! it works perfectly on the sample, but when i try it on my SAS dataset i keep getting a "key not found" error with it. Any idea why the hash table would give issues? thanks again for your time and all the help. N8
Hello @N8 "key not found" error means the FIND method is failing which effectively means the pattern CASE=0 has some year values not existing in CASE=1's year values. Now, are you sure the pattern CASE=1 will have all year values that of CASE=0 in a set. I kinda feared this assumption might not be accurate. Can you please check your data thoroughly and let me know, so that I can tweak.
A simulation of the ERROR: Key not found--
CARDS;
A1 1 1 1 -3
A1 2 1 1 -2
A1 3 1 1 -1
B2 1 0 1 .
B2 2 0 1 .
B2 3 0 1 .
B2 4 0 1 . /*add this to the sample and run again*/
That will cause Key not found error.
DATA Have;
INPUT ID $ year Case Set pattern_var ;
CARDS;
A1 1 1 1 -3
A1 2 1 1 -2
A1 3 1 1 -1
B2 1 0 1 .
B2 2 0 1 .
B2 3 0 1 .
L5 1 0 1 .
L5 2 0 1 .
L5 3 0 1 .
C8 1 1 2 -2
C8 2 1 2 -1
C8 3 1 2 0
A4 1 0 2 .
A4 2 0 2 .
A4 3 0 2 .
D8 1 0 2 .
D8 2 0 2 .
D8 3 0 2 .
Z9 1 1 3 2
Z9 2 1 3 3
Z9 3 1 3 4
K8 1 0 3 .
K8 2 0 3 .
K8 3 0 3 .
R3 1 0 3 .
R3 2 0 3 .
R3 3 0 3 .
;
data want;
set have;
array x{99999} _temporary_;
if not missing(pattern_var) then x{year}=pattern_var;
else pattern_var=x{year};
run;
outstanding! I will log on to the server Monday and let you know. Thanks again for your help on this. Much appreciated. More soon, Nate
Many thanks for all your help - appreciate all the input and time. nate
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.