BookmarkSubscribeRSS Feed
kate90
Calcite | Level 5

Hello,

 

I have a dataset with multiple observations for some participant IDs. I am wondering if there is a way to populate the first observation (by ID) with data in a subsequent observation for multiple variables, without using a different data step for each variable? For example:

 

ID     Var1     Var2

1        8           9

2        .            4

2        3           2

3        5           7

4        0           .

4        6           10

 

For the first observation of ID #2, I would want to populate Var1 with 3; for the first ID #4, I want to populate Var1 with 6 and Var2 with 10. I have tried retain (for the first observation) and creating an output dataset with only the populated observations to merge back with the original, but I am trying to find a simpler way that works across multiple variables. Thank you.

7 REPLIES 7
Jagadishkatam
Amethyst | Level 16

using retain as below

 

data have;
input ID Var1 Var2;
cards;
1 8 9
2 . 4
2 3 2
3 5 7
4 0 .
4 6 10
;

proc sort data=have;
by id descending var1 descending var2;
run;

data want;
set have;
retain newvar1_ newvar2_;
by id;
if first.id then do; newvar1_=.;newvar2_=.;end;
if var1 ne . then newvar1_=var1;
if var2 ne . then newvar2_=var2;
drop var1 var2;
run;
Thanks,
Jag
Astounding
PROC Star
Suppose that an ID has a missing value for VAR1 on its fifth observation. Do you want to replace that with a non missing value from its sixth observation?
kate90
Calcite | Level 5

I mainly want to populate the first observation with any subsequent non-missing values, so that all variables have data contained within one observation. 

novinosrin
Tourmaline | Level 20

Hello @kate90   I have the same doubt that @Astounding  asks as your sample isn't showing that. Nevertheless, i am going with the assumption i.e trusting your sample

data have;
input ID Var1 Var2;
cards;
1 8 9
2 . 4
2 3 2
3 5 7
4 0 .
4 6 10
;


data want ;
if _n_=1 then do;
   dcl hash H () ;
   h.definekey  ("id","name") ;
   h.definedata ("value") ;
   h.definedone () ;
end;
do until(last.id);
set have end=lr;
by id;
array t(*) var1-var2;
do i=1 to dim(t);
if t(i)>. then do;
name=vname(t(i));
value=t(i);
rc=h.add();
end;
end;
end;
do until(last.id);
set have end=lr;
by id;
do i=1 to dim(t);
if t(i)=. then do;name=vname(t(i)); rc=h.find();t(i)=value;end;
end;
output;
end;
h.clear();
drop value name i rc;
run;

Astounding
PROC Star

@novinosrin ,

 

Your posts have finally convinced me that I need to learn hashing!

 

I have one question about your proposed solution, using h.add.  What happens when there are 5 observations for an ID?  Does the add method ignore additional values once the data has been populated for that key?

 

Thanks!

 

novinosrin
Tourmaline | Level 20

"I have one question about your proposed solution, using h.add.  What happens when there are 5 observations for an ID?  Does the add method ignore additional values once the data has been populated for that key?"

 

Yes Sir @Astounding , Your understanding is spot on. The reason being, we have not explicitly defined multidata;'y' during hash object instantiation to hold more than data item for a given key. Therefore the return code would be a non zero value. However, the catch is , object dot notation method must be assigned to a return code otherwise the implicit failure of the dot notation method will flood the log with errors stating duplicate entry. So in this case rc is a requirement. Thank you

Tom
Super User Tom
Super User

You essentially are asking how to do a reverse Last Observation Carried Forward (LOCF) operation.  The simplest way to do LOCF is to take advantage of the UPDATE statement in a DATA step.

data locf ;
  update have(obs=0) have ;
  by id;
   output;
run;

Do do the reverse just order the observations backwards.  Your sample data does have a sort key so perhaps you can just make one.

data reverse;
  set have;
  row+1;
run;
proc sort data=reverse ;
  by id descending row;
run;
data locb ;
  update reverse(obs=0) reverse ;
  by id;
  output;
run;

But in addition to the reverse aspect you also added that you want to treat zeros as missing.  

Why do you want to treat zero as missing?

If you do want to treat zero as missing then you could set them to missing in the step that is adding the ROW number variable.

data reverse;
  set have;
  row+1;
  array varlist var1-var2;
  do over varlist;
     if varlist=0 then varlist=.;
  end;
run;

 

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to connect to databases in SAS Viya

Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 7 replies
  • 2655 views
  • 1 like
  • 5 in conversation