Help using Base SAS procedures

Datastep question: How to "keep" values over several rows

Reply
Occasional Contributor TM
Occasional Contributor
Posts: 7

Datastep question: How to "keep" values over several rows

Hi, i have a problem that i can't seem to solve with the level of datasteppin' im at.

My data looks like this:

Iden     Var1     Constant     Count

1          a             x               1         

1          a             x               2           

1          b                              3

1          c             z               4

1          b                              5  

2          a             x               1

2                                          2

3                                          1

3          b             y                2

3          b             y                3

3          a             x                4

then i use arrays to make it look like this:

Iden     Var1_1     Var1_2     Var1_3     Var1_4     Var1_5     Constant

1               a              a              b            c              b              

2               a                                                                          x

3                               b              b            a                             x

Now my problem is that my constant variable is sometimes present, and sometimes it's empty

I would like it to look like this:

Iden     Var1     Constant     Count

1          a             x               1         

1          a             x               2           

1          b             x                3

1          c             z               4

1          b             z                5  

2          a             x               1

2                         x                 2

3                                          1

3          b             y                2

3          b             y                3

3          a             x                4

In other words, when constant is empty, SAS takes the lateest value. But without overwriteing it, if the next Var1 has a Constant value.

Is this understandable?

Hope you can help!

--------------------------------------------

Arrays code:

data rotate;

set c;

by ship_to_party;

retain numabb1-numabb42;

array Anumabb(1:42) numabb1-numabb42;

if first.ship_to_party then do;

do i = 1 to 42;

          Anumabb = .;

end;

end;

Anumabb(week) = numabb;

if last.ship_to_party then output;

drop i;

run;quit;

data rotate_1;

set rotate;

array Anumabb(42) numabb1-numabb42;

do i = 1 to 42;

if Anumabb = ' ' then Anumabb = 0;

end;

drop i;

run;quit;

Respected Advisor
Posts: 3,777

Re: Datastep question: How to "keep" values over several rows

I believe you have a typical LOCF.  It is easiest to create a new variable.  But you could drop and rename of you want to keep it "Constant".  See if this produces the desired output.

data have;

   input (Iden Var1 Constant)(:$1.)  Count;

   cards;

1 a x 1

1 a x 2

1 b . 3

1 c z 4

1 b . 5

2 a x 1

2 . . 2

3 . . 1

3 b y 2

3 b y 3

3 a x 4

;;;;

   run;

data need;

   do until(last.iden);

      set have;

      by iden;

      length locf $1;

      Locf = coalesceC(constant,locf);

      output;

      end;

   run;

proc print;

   run;

Super User
Posts: 9,677

Re: Datastep question: How to "keep" values over several rows

Data step can easily get it.

data have;
   input (Iden Var1 Constant)( $ )  Count;
   cards;
1 a x 1 
1 a x 2 
1 b . 3
1 c z 4
1 b . 5 
2 a x 1
2 . . 2
3 . . 1
3 b y 2
3 b y 3
3 a x 4
;;;;
   run;
data want(drop=_con);
 set have;
 length _con $ 8;
 retain _con;
 if iden ne lag(iden) then call missing(_con);
 if not missing(Constant) then _con=Constant;
  else Constant=_con;
run;

Ksharp

Occasional Contributor TM
Occasional Contributor
Posts: 7

Datastep question: How to "keep" values over several rows

Thanks for your anwsers guys! They work.

Seeing that i have over 2 million rows, is there anyway to do it faster?

Valued Guide
Posts: 2,175

Datastep question: How to "keep" values over several rows

faster =

don't do it

except when you next use it

achieve that with a view

In the data_null_; example change only the "data need" statement

data need / view= need ;

or

In the ksharp example

data want(drop=_con)/ view= want ;

The idea is that you probably only do this once for all 2M rows and then process it with something else. So hold no intermediate file, just the VIEW

another $0.00 from peterC

Super User
Posts: 9,677

Re: Datastep question: How to "keep" values over several rows

It is almost not faster. 2 millions is fine. I guess only spending three or four minutes will run over it.

And Peter is very humorous. I think OP want these data to analysis usually, so making a view is not a good idea.

Every time open this table, you will execute a query which will waste a lot of time.

Ksharp

Valued Guide
Posts: 765

Re: Datastep question: How to "keep" values over several rows

Hi ... despite Peter's plea of "don't do this" ... how about ...

data have;

input (iden var1 constant)(: $1.) count;

datalines;

1 a x 1

1 a x 2

1 b . 3

1 c z 4

1 b . 5

2 a x 1

2 . . 2

3 . . 1

3 b y 2

3 b y 3

3 a x 4

;

run;

data want;

set have;

by iden;

constant = ifc(^first.iden and missing(constant), lag(constant), constant);

run;

based on Howard Schreier's "Conditional Lags Don't Have to be Treacherous" ...

http://www.howles.com/saspapers/CC33.pdf

Valued Guide
Posts: 2,175

Re: Datastep question: How to "keep" values over several rows

adopting this concise method from MikeZdeb - and only when these data are used

data want/view=want;

set have;

by iden;

constant = ifc(^first.iden and missing(constant), lag(constant), constant);

run;

Super User
Posts: 9,677

Re: Datastep question: How to "keep" values over several rows

Mike.

I have to point it out that your code is not worked.

Lag() only hold the value of last obs ,will not hold a varying value.

So when have several missing value sequentially, lag() will not return right value.

data have;
input (iden var1 constant)(: $1.) count;
datalines;
1 a x 1
1 a x 2
1 b . 3
1 c z 4
1 b . 5
2 a x 1
2 . . 2
2 . . 4
3 . . 1
3 b y 2
3 b y 3
3 a x 4
;
run;

 

data want;
set have;
by iden;
constant = ifc(^first.iden and missing(constant), lag(constant), constant);
run;

Ksharp

Valued Guide
Posts: 765

Datastep question: How to "keep" values over several rows

Hi ... correct, only worked for posted data, nice catch.

This is a variation on your posting plus coalescec from data _null_ (and Peter's "view") .

I think it works, but I'm willing to be corrected (again !!!).

data want(drop=_con) / view=want;

set have (where=(0)) have (rename=(constant=_con));

by iden;

constant = ifc(first.iden , _con , coalescec(_con,constant)) ;

run;


Super User
Posts: 9,677

Datastep question: How to "keep" values over several rows

Hi.Mike. It is much better.

Contributor
Posts: 43

Datastep question: How to "keep" values over several rows

You guys are great!

Thanks a bunch.

Ask a Question
Discussion stats
  • 11 replies
  • 181 views
  • 0 likes
  • 6 in conversation