SAS Data Management

kate90 · Posted 05-07-2019 08:25 AM

Hello,

I have a dataset with multiple observations for some participant IDs. I am wondering if there is a way to populate the first observation (by ID) with data in a subsequent observation for multiple variables, without using a different data step for each variable? For example:

ID Var1 Var2

1 8 9

2 . 4

2 3 2

3 5 7

4 0 .

4 6 10

For the first observation of ID #2, I would want to populate Var1 with 3; for the first ID #4, I want to populate Var1 with 6 and Var2 with 10. I have tried retain (for the first observation) and creating an output dataset with only the populated observations to merge back with the original, but I am trying to find a simpler way that works across multiple variables. Thank you.

Jagadishkatam · Posted 05-07-2019 08:38 AM

using retain as below

data have;
input ID Var1 Var2;
cards;
1 8 9
2 . 4
2 3 2
3 5 7
4 0 .
4 6 10
;

proc sort data=have;
by id descending var1 descending var2;
run;

data want;
set have;
retain newvar1_ newvar2_;
by id;
if first.id then do; newvar1_=.;newvar2_=.;end;
if var1 ne . then newvar1_=var1;
if var2 ne . then newvar2_=var2;
drop var1 var2;
run;

Thanks,
Jag

Astounding · Posted 05-07-2019 08:56 AM

Suppose that an ID has a missing value for VAR1 on its fifth observation. Do you want to replace that with a non missing value from its sixth observation?

kate90 · Posted 05-07-2019 09:15 AM

I mainly want to populate the first observation with any subsequent non-missing values, so that all variables have data contained within one observation.

novinosrin · Posted 05-07-2019 09:14 AM

Hello @kate90 I have the same doubt that @Astounding asks as your sample isn't showing that. Nevertheless, i am going with the assumption i.e trusting your sample

data have;
input ID Var1 Var2;
cards;
1 8 9
2 . 4
2 3 2
3 5 7
4 0 .
4 6 10
;


data want ;
if _n_=1 then do;
   dcl hash H () ;
   h.definekey  ("id","name") ;
   h.definedata ("value") ;
   h.definedone () ;
end;
do until(last.id);
set have end=lr;
by id;
array t(*) var1-var2;
do i=1 to dim(t);
if t(i)>. then do;
name=vname(t(i));
value=t(i);
rc=h.add();
end;
end;
end;
do until(last.id);
set have end=lr;
by id;
do i=1 to dim(t);
if t(i)=. then do;name=vname(t(i)); rc=h.find();t(i)=value;end;
end;
output;
end;
h.clear();
drop value name i rc;
run;

Astounding · Posted 05-07-2019 06:37 PM

@novinosrin ,

Your posts have finally convinced me that I need to learn hashing!

I have one question about your proposed solution, using h.add. What happens when there are 5 observations for an ID? Does the add method ignore additional values once the data has been populated for that key?

Thanks!

novinosrin · Posted 05-07-2019 06:45 PM

"I have one question about your proposed solution, using h.add. What happens when there are 5 observations for an ID? Does the add method ignore additional values once the data has been populated for that key?"

Yes Sir @Astounding , Your understanding is spot on. The reason being, we have not explicitly defined multidata;'y' during hash object instantiation to hold more than data item for a given key. Therefore the return code would be a non zero value. However, the catch is , object dot notation method must be assigned to a return code otherwise the implicit failure of the dot notation method will flood the log with errors stating duplicate entry. So in this case rc is a requirement. Thank you

Tom · Posted 05-07-2019 08:08 PM

You essentially are asking how to do a reverse Last Observation Carried Forward (LOCF) operation. The simplest way to do LOCF is to take advantage of the UPDATE statement in a DATA step.

data locf ;
  update have(obs=0) have ;
  by id;
   output;
run;

Do do the reverse just order the observations backwards. Your sample data does have a sort key so perhaps you can just make one.

data reverse;
  set have;
  row+1;
run;
proc sort data=reverse ;
  by id descending row;
run;
data locb ;
  update reverse(obs=0) reverse ;
  by id;
  output;
run;

But in addition to the reverse aspect you also added that you want to treat zeros as missing.

Why do you want to treat zero as missing?

If you do want to treat zero as missing then you could set them to missing in the step that is adding the ROW number variable.

data reverse;
  set have;
  row+1;
  array varlist var1-var2;
  do over varlist;
     if varlist=0 then varlist=.;
  end;
run;

SAS Data Management

How to populate variables using subsequent observations

Re: How to populate variables using subsequent observations

Re: How to populate variables using subsequent observations

Re: How to populate variables using subsequent observations

Re: How to populate variables using subsequent observations

Re: How to populate variables using subsequent observations

Re: How to populate variables using subsequent observations

Re: How to populate variables using subsequent observations

Tech Titans- climate change and vulnerable populations

Nutriólogos SAS - Data for Good | Climate Change + Vulnerable Populati...

Leveraging Data for Good: Analyzing Climate Change Impact on Vulnerabl...

AJA (Alliance of Junior Analysts) - Data for Good | Climate Change + ...

Filter observations based on two variables

Follow Us

What is...

SAS Data Management

Our biggest data and AI event of the year.

Follow Us

What is...