BookmarkSubscribeRSS Feed
TM
Calcite | Level 5 TM
Calcite | Level 5

Hi, i have a problem that i can't seem to solve with the level of datasteppin' im at.

My data looks like this:

Iden     Var1     Constant     Count

1          a             x               1         

1          a             x               2           

1          b                              3

1          c             z               4

1          b                              5  

2          a             x               1

2                                          2

3                                          1

3          b             y                2

3          b             y                3

3          a             x                4

then i use arrays to make it look like this:

Iden     Var1_1     Var1_2     Var1_3     Var1_4     Var1_5     Constant

1               a              a              b            c              b              

2               a                                                                          x

3                               b              b            a                             x

Now my problem is that my constant variable is sometimes present, and sometimes it's empty

I would like it to look like this:

Iden     Var1     Constant     Count

1          a             x               1         

1          a             x               2           

1          b             x                3

1          c             z               4

1          b             z                5  

2          a             x               1

2                         x                 2

3                                          1

3          b             y                2

3          b             y                3

3          a             x                4

In other words, when constant is empty, SAS takes the lateest value. But without overwriteing it, if the next Var1 has a Constant value.

Is this understandable?

Hope you can help!

--------------------------------------------

Arrays code:

data rotate;

set c;

by ship_to_party;

retain numabb1-numabb42;

array Anumabb(1:42) numabb1-numabb42;

if first.ship_to_party then do;

do i = 1 to 42;

          Anumabb = .;

end;

end;

Anumabb(week) = numabb;

if last.ship_to_party then output;

drop i;

run;quit;

data rotate_1;

set rotate;

array Anumabb(42) numabb1-numabb42;

do i = 1 to 42;

if Anumabb = ' ' then Anumabb = 0;

end;

drop i;

run;quit;

11 REPLIES 11
data_null__
Jade | Level 19

I believe you have a typical LOCF.  It is easiest to create a new variable.  But you could drop and rename of you want to keep it "Constant".  See if this produces the desired output.

data have;

   input (Iden Var1 Constant)(:$1.)  Count;

   cards;

1 a x 1

1 a x 2

1 b . 3

1 c z 4

1 b . 5

2 a x 1

2 . . 2

3 . . 1

3 b y 2

3 b y 3

3 a x 4

;;;;

   run;

data need;

   do until(last.iden);

      set have;

      by iden;

      length locf $1;

      Locf = coalesceC(constant,locf);

      output;

      end;

   run;

proc print;

   run;

Ksharp
Super User

Data step can easily get it.

data have;
   input (Iden Var1 Constant)( $ )  Count;
   cards;
1 a x 1 
1 a x 2 
1 b . 3
1 c z 4
1 b . 5 
2 a x 1
2 . . 2
3 . . 1
3 b y 2
3 b y 3
3 a x 4
;;;;
   run;
data want(drop=_con);
 set have;
 length _con $ 8;
 retain _con;
 if iden ne lag(iden) then call missing(_con);
 if not missing(Constant) then _con=Constant;
  else Constant=_con;
run;

Ksharp

TM
Calcite | Level 5 TM
Calcite | Level 5

Thanks for your anwsers guys! They work.

Seeing that i have over 2 million rows, is there anyway to do it faster?

Peter_C
Rhodochrosite | Level 12

faster =

don't do it

except when you next use it

achieve that with a view

In the data_null_; example change only the "data need" statement

data need / view= need ;

or

In the ksharp example

data want(drop=_con)/ view= want ;

The idea is that you probably only do this once for all 2M rows and then process it with something else. So hold no intermediate file, just the VIEW

another $0.00 from peterC

Ksharp
Super User

It is almost not faster. 2 millions is fine. I guess only spending three or four minutes will run over it.

And Peter is very humorous. I think OP want these data to analysis usually, so making a view is not a good idea.

Every time open this table, you will execute a query which will waste a lot of time.

Ksharp

MikeZdeb
Rhodochrosite | Level 12

Hi ... despite Peter's plea of "don't do this" ... how about ...

data have;

input (iden var1 constant)(: $1.) count;

datalines;

1 a x 1

1 a x 2

1 b . 3

1 c z 4

1 b . 5

2 a x 1

2 . . 2

3 . . 1

3 b y 2

3 b y 3

3 a x 4

;

run;

data want;

set have;

by iden;

constant = ifc(^first.iden and missing(constant), lag(constant), constant);

run;

based on Howard Schreier's "Conditional Lags Don't Have to be Treacherous" ...

http://www.howles.com/saspapers/CC33.pdf

Peter_C
Rhodochrosite | Level 12

adopting this concise method from MikeZdeb - and only when these data are used

data want/view=want;

set have;

by iden;

constant = ifc(^first.iden and missing(constant), lag(constant), constant);

run;

Ksharp
Super User

Mike.

I have to point it out that your code is not worked.

Lag() only hold the value of last obs ,will not hold a varying value.

So when have several missing value sequentially, lag() will not return right value.

data have;
input (iden var1 constant)(: $1.) count;
datalines;
1 a x 1
1 a x 2
1 b . 3
1 c z 4
1 b . 5
2 a x 1
2 . . 2
2 . . 4
3 . . 1
3 b y 2
3 b y 3
3 a x 4
;
run;

 

data want;
set have;
by iden;
constant = ifc(^first.iden and missing(constant), lag(constant), constant);
run;

Ksharp

MikeZdeb
Rhodochrosite | Level 12

Hi ... correct, only worked for posted data, nice catch.

This is a variation on your posting plus coalescec from data _null_ (and Peter's "view") .

I think it works, but I'm willing to be corrected (again !!!).

data want(drop=_con) / view=want;

set have (where=(0)) have (rename=(constant=_con));

by iden;

constant = ifc(first.iden , _con , coalescec(_con,constant)) ;

run;


Ksharp
Super User

Hi.Mike. It is much better.

TMorville
Calcite | Level 5

You guys are great!

Thanks a bunch.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 11 replies
  • 1067 views
  • 0 likes
  • 6 in conversation