Hi, i have a problem that i can't seem to solve with the level of datasteppin' im at.
My data looks like this:
Iden Var1 Constant Count
1 a x 1
1 a x 2
1 b 3
1 c z 4
1 b 5
2 a x 1
2 2
3 1
3 b y 2
3 b y 3
3 a x 4
then i use arrays to make it look like this:
Iden Var1_1 Var1_2 Var1_3 Var1_4 Var1_5 Constant
1 a a b c b
2 a x
3 b b a x
Now my problem is that my constant variable is sometimes present, and sometimes it's empty
I would like it to look like this:
Iden Var1 Constant Count
1 a x 1
1 a x 2
1 b x 3
1 c z 4
1 b z 5
2 a x 1
2 x 2
3 1
3 b y 2
3 b y 3
3 a x 4
In other words, when constant is empty, SAS takes the lateest value. But without overwriteing it, if the next Var1 has a Constant value.
Is this understandable?
Hope you can help!
--------------------------------------------
Arrays code:
data rotate;
set c;
by ship_to_party;
retain numabb1-numabb42;
array Anumabb(1:42) numabb1-numabb42;
if first.ship_to_party then do;
do i = 1 to 42;
Anumabb = .;
end;
end;
Anumabb(week) = numabb;
if last.ship_to_party then output;
drop i;
run;quit;
data rotate_1;
set rotate;
array Anumabb(42) numabb1-numabb42;
do i = 1 to 42;
if Anumabb = ' ' then Anumabb = 0;
end;
drop i;
run;quit;
I believe you have a typical LOCF. It is easiest to create a new variable. But you could drop and rename of you want to keep it "Constant". See if this produces the desired output.
data have;
input (Iden Var1 Constant)(:$1.) Count;
cards;
1 a x 1
1 a x 2
1 b . 3
1 c z 4
1 b . 5
2 a x 1
2 . . 2
3 . . 1
3 b y 2
3 b y 3
3 a x 4
;;;;
run;
data need;
do until(last.iden);
set have;
by iden;
length locf $1;
Locf = coalesceC(constant,locf);
output;
end;
run;
proc print;
run;
Data step can easily get it.
data have; input (Iden Var1 Constant)( $ ) Count; cards; 1 a x 1 1 a x 2 1 b . 3 1 c z 4 1 b . 5 2 a x 1 2 . . 2 3 . . 1 3 b y 2 3 b y 3 3 a x 4 ;;;; run; data want(drop=_con); set have; length _con $ 8; retain _con; if iden ne lag(iden) then call missing(_con); if not missing(Constant) then _con=Constant; else Constant=_con; run;
Ksharp
Thanks for your anwsers guys! They work.
Seeing that i have over 2 million rows, is there anyway to do it faster?
faster =
don't do it
except when you next use it
achieve that with a view
In the data_null_; example change only the "data need" statement
data need / view= need ;
or
In the ksharp example
data want(drop=_con)/ view= want ;
The idea is that you probably only do this once for all 2M rows and then process it with something else. So hold no intermediate file, just the VIEW
another $0.00 from peterC
It is almost not faster. 2 millions is fine. I guess only spending three or four minutes will run over it.
And Peter is very humorous. I think OP want these data to analysis usually, so making a view is not a good idea.
Every time open this table, you will execute a query which will waste a lot of time.
Ksharp
Hi ... despite Peter's plea of "don't do this" ... how about ...
data have;
input (iden var1 constant)(: $1.) count;
datalines;
1 a x 1
1 a x 2
1 b . 3
1 c z 4
1 b . 5
2 a x 1
2 . . 2
3 . . 1
3 b y 2
3 b y 3
3 a x 4
;
run;
data want;
set have;
by iden;
constant = ifc(^first.iden and missing(constant), lag(constant), constant);
run;
based on Howard Schreier's "Conditional Lags Don't Have to be Treacherous" ...
adopting this concise method from MikeZdeb - and only when these data are used
data want/view=want;
set have;
by iden;
constant = ifc(^first.iden and missing(constant), lag(constant), constant);
run;
Mike.
I have to point it out that your code is not worked.
Lag() only hold the value of last obs ,will not hold a varying value.
So when have several missing value sequentially, lag() will not return right value.
data have; input (iden var1 constant)(: $1.) count; datalines; 1 a x 1 1 a x 2 1 b . 3 1 c z 4 1 b . 5 2 a x 1 2 . . 2 2 . . 4 3 . . 1 3 b y 2 3 b y 3 3 a x 4 ; run; data want; set have; by iden; constant = ifc(^first.iden and missing(constant), lag(constant), constant); run;
Ksharp
Hi ... correct, only worked for posted data, nice catch.
This is a variation on your posting plus coalescec from data _null_ (and Peter's "view") .
I think it works, but I'm willing to be corrected (again !!!).
data want(drop=_con) / view=want;
set have (where=(0)) have (rename=(constant=_con));
by iden;
constant = ifc(first.iden , _con , coalescec(_con,constant)) ;
run;
Hi.Mike. It is much better.
You guys are great!
Thanks a bunch.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.