BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
N8
Obsidian | Level 7 N8
Obsidian | Level 7

Hi all, quick question about populating missing values by group, but doing so systematically so that the pattern of data for the 'cases' is applied to the 'controls'. I've included a Have/Want example. The only thing different between this sample and what i want is that there are multiple controls per case (e.g., 6:1), but they are sorted so that once a new case occurs the pattern will repeat itself following the same logic. Thank you for any advice to help me figure it out. I think it needs a do loop in order to get the same pattern, but so far what i've been able to get is not correct. Nate


DATA Have;
INPUT ID $ Case Set Time ;
CARDS;

A 1 1 -3
A 1 1 -2
A 1 1 -1
A 1 1 0
A 1 1 1
A 1 1 2
A 1 1 3
B 0 1 .
B 0 1 .
B 0 1 .
B 0 1 .
B 0 1 .
B 0 1 .
B 0 1 .
C 1 2 -2
C 1 2 -1
C 1 2 0
C 1 2 1
C 1 2 2
C 1 2 3
C 1 2 4
D 0 2 .
D 0 2 .
D 0 2 .
D 0 2 .
D 0 2 .
D 0 2 .
D 0 2 .
;

 

DATA Want;
INPUT ID $ Case Set Time ;
CARDS;

A 1 1 -3
A 1 1 -2
A 1 1 -1
A 1 1 0
A 1 1 1
A 1 1 2
A 1 1 3
B 0 1 -3
B 0 1 -2
B 0 1 -1
B 0 1 0
B 0 1 1
B 0 1 2
B 0 1 3
C 1 2 -2
C 1 2 -1
C 1 2 0
C 1 2 1
C 1 2 2
C 1 2 3
C 1 2 4
D 0 2 -2
D 0 2 -1
D 0 2 0
D 0 2 1
D 0 2 2
D 0 2 3
D 0 2 4
;

 

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User
DATA Have;
INPUT ID $ year Case Set pattern_var ;
CARDS;
A1 1 1 1 -3
A1 2 1 1 -2
A1 3 1 1 -1
B2 1 0 1 .
B2 2 0 1 .
B2 3 0 1 .
L5 1 0 1 .
L5 2 0 1 .
L5 3 0 1 .
C8 1 1 2 -2
C8 2 1 2 -1
C8 3 1 2 0
A4 1 0 2 .
A4 2 0 2 .
A4 3 0 2 .
D8 1 0 2 .
D8 2 0 2 .
D8 3 0 2 .
Z9 1 1 3 2
Z9 2 1 3 3
Z9 3 1 3 4
K8 1 0 3 .
K8 2 0 3 .
K8 3 0 3 .
R3 1 0 3 .
R3 2 0 3 .
R3 3 0 3 .
;

data want;
 set have;
 array x{99999} _temporary_;
 if not missing(pattern_var) then x{year}=pattern_var;
   else pattern_var=x{year};
run;

View solution in original post

11 REPLIES 11
novinosrin
Tourmaline | Level 20

Hi @N8 Assuming i understand your requirement-


DATA Have; 
INPUT ID $ Case Set Time ;
CARDS;
A 1 1 -3
A 1 1 -2
A 1 1 -1
A 1 1 0
A 1 1 1
A 1 1 2
A 1 1 3
B 0 1 .
B 0 1 .
B 0 1 .
B 0 1 .
B 0 1 .
B 0 1 .
B 0 1 .
C 1 2 -2
C 1 2 -1
C 1 2 0
C 1 2 1
C 1 2 2
C 1 2 3
C 1 2 4
D 0 2 .
D 0 2 .
D 0 2 .
D 0 2 .
D 0 2 .
D 0 2 .
D 0 2 .
;


data want ;
  do _n=1 by 1 until(last.id) ;
    set have ;
    by id ;
	if mod(_n_,2) then continue ;
	array t(9999) _temporary_;
	time=t(_n) ;
	output ;
  end;
  call missing(of t(*)) ;
  do _n=1 by 1 until(last.id) ;
    set have ;
	by id ;
	if mod(_n_,2)=0 then continue ;
	t(_n)=time ;
	output ;
  end;
  drop _: ;
run ;
 
proc print noobs ;run ;
ID Case Set Time
A 1 1 -3
A 1 1 -2
A 1 1 -1
A 1 1 0
A 1 1 1
A 1 1 2
A 1 1 3
B 0 1 -3
B 0 1 -2
B 0 1 -1
B 0 1 0
B 0 1 1
B 0 1 2
B 0 1 3
C 1 2 -2
C 1 2 -1
C 1 2 0
C 1 2 1
C 1 2 2
C 1 2 3
C 1 2 4
D 0 2 -2
D 0 2 -1
D 0 2 0
D 0 2 1
D 0 2 2
D 0 2 3
D 0 2 4
novinosrin
Tourmaline | Level 20

I guess you just need this after all?-

 

data want ;
 do _n=1 by 1 until(last.id) ;
   set have ;
   by id;
   array t(99) _temporary_;
   if mod(_n_,2) then t(_n)=time;
   else time=t(_n);
   output;
 end;
 drop _: ;
run;
N8
Obsidian | Level 7 N8
Obsidian | Level 7

Thanks novinosrin - Sadly, no. I think one of the issue is that my actual ID variables aren't ordered consecutively - it's random alpha/numeric - so the primary variable that it has to match on is the 'setnumbr' (i.e., i want to populate all the missing setnumbr values with the values for the 'cases' of the same setnumbr), but they need to correspond to the same year + group ID that corresponds with the values for the 'cases'. I'll keep plugging away at it. I appreciate the effort! thank you! 

novinosrin
Tourmaline | Level 20

Hi @N8 Okay never mind.  Can you please post a more representative sample of what you have and the want with the YEAR variable and so forth. Also, please explain the logic with the modified sample. That can only help us avoid assumptions/guesses. Thanks!

N8
Obsidian | Level 7 N8
Obsidian | Level 7

No worries. I've been working with setting it up with a do loop, but i'm only able to get the first value to match. 

This is a pretty good pared down replication of the source file. What I am trying to do is replicate the "pattern_var" so that all of the Case=0 values that belong to the same "set" as its corresponding "case" get the same corresponding value as the case has in the "pattern_var". Where I get hung up is that I need to get the code to cycle through the ID + year so that the controls get the same corresponding value as the case for each year. If I haven't exhausted your kindness... 🙂 

 

DATA Have;
INPUT ID $ year Case Set pattern_var ;
CARDS;

A1 1 1 1 -3
A1 2 1 1 -2
A1 3 1 1 -1
B2 1 0 1 .
B2 2 0 1 .
B2 3 0 1 .
L5 1 0 1 .
L5 2 0 1 .
L5 3 0 1 .
C8 1 1 2 -2
C8 2 1 2 -1
C8 3 1 2 0
A4 1 0 2 .
A4 2 0 2 .
A4 3 0 2 .
D8 1 0 2 .
D8 2 0 2 .
D8 3 0 2 .
Z9 1 1 3 2
Z9 2 1 3 3
Z9 3 1 3 4
K8 1 0 3 .
K8 2 0 3 .
K8 3 0 3 .
R3 1 0 3 .
R3 2 0 3 .
R3 3 0 3 .

;

novinosrin
Tourmaline | Level 20

Hi again @N8  Thank you for posting the modified sample. My understanding is that for each group of a set, there are  2 distinct cases(1 and 0) and the records for each set begins with case=1 with a non missing pattern_var. I guess this pattern pretty much holds true across all sets beginning with records with case=1 with non missing pattern_var begins followed case=0 with missing pattern_var. The objective is to peek the corresponding pattern_var occurring in a certain year within  case=1, retrieve and update the corresponding year's pattern_var for cases=0 within a set. 

 

If the above is correct, the following should suffice I think-

 

DATA Have;
INPUT ID $ year Case Set pattern_var ;
CARDS;
A1 1 1 1 -3
A1 2 1 1 -2
A1 3 1 1 -1
B2 1 0 1 .
B2 2 0 1 .
B2 3 0 1 .
L5 1 0 1 .
L5 2 0 1 .
L5 3 0 1 .
C8 1 1 2 -2
C8 2 1 2 -1
C8 3 1 2 0
A4 1 0 2 .
A4 2 0 2 .
A4 3 0 2 .
D8 1 0 2 .
D8 2 0 2 .
D8 3 0 2 .
Z9 1 1 3 2
Z9 2 1 3 3
Z9 3 1 3 4
K8 1 0 3 .
K8 2 0 3 .
K8 3 0 3 .
R3 1 0 3 .
R3 2 0 3 .
R3 3 0 3 .
;

data want ;
  if _n_=1 then do ;
    dcl hash h() ;
    h.definekey('year') ;
    h.definedata('pattern_var') ;
    h.definedone() ;
  end ;
  do _n_=h.clear() by 0 until(last.set) ;
    do until(last.case) ;
      set have ;
	  by set case notsorted ;
	  if case=1 then h.replace() ;
      else h.find() ;
	  output ;
    end ;
  end ;
run ;

proc print noobs ;run ;
ID year Case Set pattern_var
A1 1 1 1 -3
A1 2 1 1 -2
A1 3 1 1 -1
B2 1 0 1 -3
B2 2 0 1 -2
B2 3 0 1 -1
L5 1 0 1 -3
L5 2 0 1 -2
L5 3 0 1 -1
C8 1 1 2 -2
C8 2 1 2 -1
C8 3 1 2 0
A4 1 0 2 -2
A4 2 0 2 -1
A4 3 0 2 0
D8 1 0 2 -2
D8 2 0 2 -1
D8 3 0 2 0
Z9 1 1 3 2
Z9 2 1 3 3
Z9 3 1 3 4
K8 1 0 3 2
K8 2 0 3 3
K8 3 0 3 4
R3 1 0 3 2
R3 2 0 3 3
R3 3 0 3 4

 

 

N8
Obsidian | Level 7 N8
Obsidian | Level 7

Thank you! it works perfectly on the sample, but when i try it on my SAS dataset i keep getting a "key not found" error with it. Any idea why the hash table would give issues? thanks again for your time and all the help. N8

novinosrin
Tourmaline | Level 20

Hello @N8  "key not found" error  means the FIND method is failing which effectively means the pattern CASE=0 has some year values not existing in CASE=1's year values. Now, are you sure the pattern CASE=1 will have all year values that of CASE=0 in a set. I kinda feared this assumption might not be accurate.  Can you please check your data thoroughly and let me know, so that I can tweak.

 

A simulation of the ERROR: Key not found--

CARDS;
A1 1 1 1 -3
A1 2 1 1 -2
A1 3 1 1 -1
B2 1 0 1 .
B2 2 0 1 .
B2 3 0 1 .
B2 4 0 1 .  /*add this to the sample and run again*/

 

That will cause Key not found error.

Ksharp
Super User
DATA Have;
INPUT ID $ year Case Set pattern_var ;
CARDS;
A1 1 1 1 -3
A1 2 1 1 -2
A1 3 1 1 -1
B2 1 0 1 .
B2 2 0 1 .
B2 3 0 1 .
L5 1 0 1 .
L5 2 0 1 .
L5 3 0 1 .
C8 1 1 2 -2
C8 2 1 2 -1
C8 3 1 2 0
A4 1 0 2 .
A4 2 0 2 .
A4 3 0 2 .
D8 1 0 2 .
D8 2 0 2 .
D8 3 0 2 .
Z9 1 1 3 2
Z9 2 1 3 3
Z9 3 1 3 4
K8 1 0 3 .
K8 2 0 3 .
K8 3 0 3 .
R3 1 0 3 .
R3 2 0 3 .
R3 3 0 3 .
;

data want;
 set have;
 array x{99999} _temporary_;
 if not missing(pattern_var) then x{year}=pattern_var;
   else pattern_var=x{year};
run;
N8
Obsidian | Level 7 N8
Obsidian | Level 7

outstanding! I will log on to the server Monday and let you know. Thanks again for your help on this. Much appreciated. More soon, Nate

N8
Obsidian | Level 7 N8
Obsidian | Level 7

Many thanks for all your help - appreciate all the input and time. nate

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 11 replies
  • 838 views
  • 2 likes
  • 3 in conversation