DATA Step, Macro, Functions and more

add values from previous observation

Accepted Solution Solved
Reply
Contributor
Posts: 20
Accepted Solution

add values from previous observation

Hi,

Assume this is my data. I would like to delete rows with missing duration and add the score for that observation to the previous one. Anything else stays the same for the previous observation.

Up to now, I was able only to delete the row if duration is missing. I have tried using array without success. Any help is appreciated.

DATA kids;

  LENGTH kidname $ 4;

  INPUT kidname time duration score;

CARDS;

Beth 1 12 30

Beth 2 7 10

Beth 3 9 60

Beth 4 . 40

Barb 1 3 20

Barb 2 5 80

Barb 3 . 10

Alic 1 6 50

Alic 2 6 20

Alic 3 3 12

Alic 4 . 17

;

run;

data first;

set kids;

if duration=. then nmiss=1; else nmiss=0;

run;

data second;

  set first;

  if nmiss=1 then delete;

run;

The final desired data would be

Beth 1 12 30

Beth 2 7 10

Beth 3 9 100

Barb 1 3 20

Barb 2 5 90

Alic 1 6 50

Alic 2 6 20

Alic 3 3 29

Thanks.


Accepted Solutions
Solution
‎01-16-2015 08:47 PM
Respected Advisor
Posts: 3,124

Re: add values from previous observation

As stated, if this is the only possible scenario in your real life data, a simple 'look ahead' will be sufficient. However, look at the following modified raw data:

DATA kids;

     LENGTH kidname $ 4;

     INPUT kidname time duration score;

     CARDS;

Beth 1 12 30

Beth 2 7 10

Beth 3 . 60

Beth 4 . 40

Barb 1 3 20

Barb 2 5 80

Barb 3 . 10

Alic 1 6 50

Alic 2 . 20

Alic 3 3 12

Alic 4 . 17

;

So Beth has 2 missing in a row and Alic has 2 missing apart. A possible Proc SQL solution may take care of Beth (let me know if you are interested in the code) but fail on Alic, while 'Look Ahead' is able to tackle Alic but fail on Beth. Array, which OP happened to mention, becomes a strong candidate to reconcile both scenario at the same time, in 1X pass data step:

data want;

     do until (last.kidname);

           set kids;

           by kidname notsorted;

           array in(3) _temporary_;

           array out(3) _temporary_;

           array t(3) time--score;

       if missing(duration) then in(3)+score;

           if not missing (duration) or last.kidname then

                do;

                     do i=1 to 3;

                           out(i)=in(i);

                           in(i)=t(i);

                           t(i)=out(i);

                     end;

                     if not missing(duration) then

                           output;

                end;

          

     end;

     call missing (of in(*), of out(*));

     drop i;

run;


Haikuo

View solution in original post


All Replies
PROC Star
Posts: 1,230

Re: add values from previous observation

Sounds like a look-ahead problem to me.

Here is one approach.

218  options mergenoby=nowarn;
219  data want;
220    merge kids
221          kids (firstobs=2
222                keep=duration score kidname
223                rename=(duration=nextduration score=nextscore kidname=nextkidname)
224                )
225    ;
226    if missing(nextduration) then do;
227      score=sum(score,nextscore);
228      if kidname ne nextkidname and not missing(nextkidname) then put "ER" "ROR: crossed name
228! boundary cuz missing first duration " kidname=;
229    end;
230    if missing(duration) then delete;
231
232    drop next:;
233    put KidName Time Duration Score;
234  run;

Beth 1 12 30
Beth 2 7 10
Beth 3 9 100
Barb 1 3 20
Barb 2 5 90
Alic 1 6 50
Alic 2 6 20
Alic 3 3 29
NOTE: There were 11 observations read from the data set WORK.KIDS.
NOTE: There were 10 observations read from the data set WORK.KIDS.
NOTE: The data set WORK.WANT has 8 observations and 4 variables.

235  options mergenoby=error;


Super Contributor
Super Contributor
Posts: 3,174

Re: add values from previous observation

Investigate an approach that uses PROC SUMMARY to aggregate important variables where you have BY/CLASS variable list (kidname and maybe time?), executed after a DATA step that does the interrogation and sets appropriate SAS variables to a missing value; also, use LAG(...) to keep track of most-recent 'keeper' row, if consider you might have two consecutive (or more) obs with a missing value for DURATION.

Scott Barry
SBBWorks, Inc.

Contributor
Posts: 52

Re: add values from previous observation

A solution is the following:

data want(keep=kidname time2 dur2 sc2 rename=(time2=time dur2=duration sc2=score));
  set kids;
  retain ix2 time2 dur2 sc2;
  by kidname notsorted;

  if first.kidname then do;
     ix2=0;
     time2=0;
     dur2=0;
     sc2=0;
  end;
  ix2+1;
  if not(last.kidname) then do;
      if (ix2>1) then output;
      time2=time;
      dur2= duration;
      sc2=score;
  end;

  if last.kidname then do;
     sc2+score;
     output;
  end;
run;

Solution
‎01-16-2015 08:47 PM
Respected Advisor
Posts: 3,124

Re: add values from previous observation

As stated, if this is the only possible scenario in your real life data, a simple 'look ahead' will be sufficient. However, look at the following modified raw data:

DATA kids;

     LENGTH kidname $ 4;

     INPUT kidname time duration score;

     CARDS;

Beth 1 12 30

Beth 2 7 10

Beth 3 . 60

Beth 4 . 40

Barb 1 3 20

Barb 2 5 80

Barb 3 . 10

Alic 1 6 50

Alic 2 . 20

Alic 3 3 12

Alic 4 . 17

;

So Beth has 2 missing in a row and Alic has 2 missing apart. A possible Proc SQL solution may take care of Beth (let me know if you are interested in the code) but fail on Alic, while 'Look Ahead' is able to tackle Alic but fail on Beth. Array, which OP happened to mention, becomes a strong candidate to reconcile both scenario at the same time, in 1X pass data step:

data want;

     do until (last.kidname);

           set kids;

           by kidname notsorted;

           array in(3) _temporary_;

           array out(3) _temporary_;

           array t(3) time--score;

       if missing(duration) then in(3)+score;

           if not missing (duration) or last.kidname then

                do;

                     do i=1 to 3;

                           out(i)=in(i);

                           in(i)=t(i);

                           t(i)=out(i);

                     end;

                     if not missing(duration) then

                           output;

                end;

          

     end;

     call missing (of in(*), of out(*));

     drop i;

run;


Haikuo

Frequent Contributor
Posts: 115

Re: add values from previous observation

DATA kids;

  LENGTH kidname $ 4;

  INPUT kidname time duration score;

CARDS;

Beth 1 12 30

Beth 2 7 10

Beth 3 9 60

Beth 4 . 40

Barb 1 3 20

Barb 2 5 80

Barb 3 . 10

Alic 1 6 50

Alic 2 6 20

Alic 3 3 12

Alic 4 . 17

;

run;

data want(drop=score1 duration1 time1);

set kids;

by kidname notsorted;

retain score1 duration1 time1;

if not last.kidname then do;

score1=score;

duration1=duration;

time1=time;

end;

if last.kidname and duration=. then do;

score=sum(score,score1);

duration=duration1;

time=time1;

end;

run;

data final;

set want;

by time notsorted;

if not last.time then delete;

run;

Super User
Posts: 9,671

Re: add values from previous observation

Assuming  there were multi-missing value followed it .

DATA kids;
     LENGTH kidname $ 4;
     INPUT kidname time duration score;
     CARDS;
Beth 1 12 30
Beth 2 7 10
Beth 3 . 60
Beth 4 . 40
Barb 1 3 20
Barb 2 5 80
Barb 3 . 10
Alic 1 6 50
Alic 2 . 20
Alic 3 3 12
Alic 4 . 17
;
run;
data temp;
  set kids;
  by kidname notsorted;
  if not missing(duration) or first.kidname then group+1;
run;
proc sql;
 create table want as
  select group,kidname, time ,duration,sum(score) as score
   from temp 
    group by group
      having duration is not missing;
quit;


Xia Keshan

Frequent Contributor
Posts: 115

Re: add values from previous observation

DATA kids;/*haikuo's raw data*/

     LENGTH kidname $ 4;

     INPUT kidname time duration score;

     CARDS;

Beth 1 12 30

Beth 2 7 10

Beth 3 . 60

Beth 4 . 40

Barb 1 3 20

Barb 2 5 80

Barb 3 . 10

Alic 1 6 50

Alic 2 . 20

Alic 3 3 12

Alic 4 . 17

;

data want1;

set kids;

by kidname notsorted;

retain score1 duration1 time1 ;

if first.kidname then call missing(score1,duration1,time1);

if duration ne . then do;

score1=score;

duration1=duration;

time1=time;

end;

if duration eq . then do;

score1+score ;

score=score1;

duration=duration1;

time=time1;

end;

run;

data final(drop=score1 duration1 time1);

set want1;

by time notsorted;

if not last.time then delete;

run;

Respected Advisor
Posts: 3,887

Re: add values from previous observation

As long as in your real data there are also no consecutive times with a missing duration then below SQL should work.

proc sql;

  create table want as

  select

    kidname,

    case(duration) when(.) then time-1 else time end as time,

    sum(duration) as duration,

    sum(score) as score

  from kids

  group by kidname, calculated time

  ;

quit;

Respected Advisor
Posts: 4,641

Re: add values from previous observation

Trying to keep it simple and also handle unexpected cases properly :

DATA kids;

  LENGTH kidname $ 4;

  INPUT kidname time duration score;

CARDS;

Beth 1 12 30

Beth 2 7 10

Beth 3 9 60

Beth 4 . 40

Barb 1 3 20

Barb 2 5 80

Barb 3 . 10

Alic 1 6 50

Alic 2 6 20

Alic 3 3 12

Alic 4 . 17

Tric 1 . 11

Tric 2 5 13

Null 1 . 7

Null 2 . 19

;

data want;

do until(last.kidname);

    set kids; by kidname notsorted;

    if missing(duration) then do;

        s = sum(s, score);

        t = coalesce(t, time);

        end;

    else do;

        if not first.kidname then output;

        t = time;

        d = duration;

        s = score;

        end;

    end;

output;

keep kidname t d s;

rename t=time d=duration s=score;

run;

proc print data=want noobs; run;

PG

PG
Contributor
Posts: 20

Re: add values from previous observation

Thanks PG.

Respected Advisor
Posts: 3,124

Re: add values from previous observation

Thanks to PG, It turns out my previous code still has some glitches. The following one seems to be more robust:

DATA kids;

     LENGTH kidname $ 4;

     INPUT kidname time duration score;

     CARDS;

Beth 1 12 30

Beth 2 7 10

Beth 3 9 60

Beth 4 5 40

Barb 1 3 20

Barb 2 . 80

Barb 3 . 10

Alic 1 6 50

Alic 2 . 20

Alic 3 3 12

Alic 4 . 17

Tric 1 . 11

Tric 2 5 13

Null 1 . 7

Null 2 . 19

;

data want;

     do until (last.kidname);

           set kids;

           by kidname notsorted;

           array in(3) _temporary_;

           array out(3) _temporary_;

           array t(3) time--score;

           if missing(duration) then

                in(3)+score;

           if not missing (duration) or (last.kidname) then

                do;

                     do i=1 to 3;

                          out(i)=in(i);

                           in(i)=t(i);

                           t(i)=out(i);

                     end;

                     if not missing(duration) then

                           output;

                     if last.kidname then

                           do;

                                do i=1 to 3;

                                     t(i)=in(i);

                                end;

                                if not missing(duration) then

                                     output;

                           end;

                end;

     end;

     call missing (of in(*), of out(*));

     drop i;

run;

Contributor
Posts: 20

Re: add values from previous observation

Thanks Hai.kuo.

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 12 replies
  • 649 views
  • 15 likes
  • 9 in conversation