Hi, i have a problem that i can't seem to solve with the level of datasteppin' im at.
My data looks like this:
Iden Var1 Constant Count
1 a x 1
1 a x 2
1 b 3
1 c z 4
1 b 5
2 a x 1
2 2
3 1
3 b y 2
3 b y 3
3 a x 4
then i use arrays to make it look like this:
Iden Var1_1 Var1_2 Var1_3 Var1_4 Var1_5 Constant
1 a a b c b
2 a x
3 b b a x
Now my problem is that my constant variable is sometimes present, and sometimes it's empty
I would like it to look like this:
Iden Var1 Constant Count
1 a x 1
1 a x 2
1 b x 3
1 c z 4
1 b z 5
2 a x 1
2 x 2
3 1
3 b y 2
3 b y 3
3 a x 4
In other words, when constant is empty, SAS takes the lateest value. But without overwriteing it, if the next Var1 has a Constant value.
Is this understandable?
Hope you can help!
--------------------------------------------
Arrays code:
data rotate;
set c;
by ship_to_party;
retain numabb1-numabb42;
array Anumabb(1:42) numabb1-numabb42;
if first.ship_to_party then do;
do i = 1 to 42;
Anumabb = .;
end;
end;
Anumabb(week) = numabb;
if last.ship_to_party then output;
drop i;
run;quit;
data rotate_1;
set rotate;
array Anumabb(42) numabb1-numabb42;
do i = 1 to 42;
if Anumabb = ' ' then Anumabb = 0;
end;
drop i;
run;quit;
I believe you have a typical LOCF. It is easiest to create a new variable. But you could drop and rename of you want to keep it "Constant". See if this produces the desired output.
data have;
input (Iden Var1 Constant)(:$1.) Count;
cards;
1 a x 1
1 a x 2
1 b . 3
1 c z 4
1 b . 5
2 a x 1
2 . . 2
3 . . 1
3 b y 2
3 b y 3
3 a x 4
;;;;
run;
data need;
do until(last.iden);
set have;
by iden;
length locf $1;
Locf = coalesceC(constant,locf);
output;
end;
run;
proc print;
run;
Data step can easily get it.
data have; input (Iden Var1 Constant)( $ ) Count; cards; 1 a x 1 1 a x 2 1 b . 3 1 c z 4 1 b . 5 2 a x 1 2 . . 2 3 . . 1 3 b y 2 3 b y 3 3 a x 4 ;;;; run; data want(drop=_con); set have; length _con $ 8; retain _con; if iden ne lag(iden) then call missing(_con); if not missing(Constant) then _con=Constant; else Constant=_con; run;
Ksharp
Thanks for your anwsers guys! They work.
Seeing that i have over 2 million rows, is there anyway to do it faster?
faster =
don't do it
except when you next use it
achieve that with a view
In the data_null_; example change only the "data need" statement
data need / view= need ;
or
In the ksharp example
data want(drop=_con)/ view= want ;
The idea is that you probably only do this once for all 2M rows and then process it with something else. So hold no intermediate file, just the VIEW
another $0.00 from peterC
It is almost not faster. 2 millions is fine. I guess only spending three or four minutes will run over it.
And Peter is very humorous. I think OP want these data to analysis usually, so making a view is not a good idea.
Every time open this table, you will execute a query which will waste a lot of time.
Ksharp
Hi ... despite Peter's plea of "don't do this" ... how about ...
data have;
input (iden var1 constant)(: $1.) count;
datalines;
1 a x 1
1 a x 2
1 b . 3
1 c z 4
1 b . 5
2 a x 1
2 . . 2
3 . . 1
3 b y 2
3 b y 3
3 a x 4
;
run;
data want;
set have;
by iden;
constant = ifc(^first.iden and missing(constant), lag(constant), constant);
run;
based on Howard Schreier's "Conditional Lags Don't Have to be Treacherous" ...
adopting this concise method from MikeZdeb - and only when these data are used
data want/view=want;
set have;
by iden;
constant = ifc(^first.iden and missing(constant), lag(constant), constant);
run;
Mike.
I have to point it out that your code is not worked.
Lag() only hold the value of last obs ,will not hold a varying value.
So when have several missing value sequentially, lag() will not return right value.
data have; input (iden var1 constant)(: $1.) count; datalines; 1 a x 1 1 a x 2 1 b . 3 1 c z 4 1 b . 5 2 a x 1 2 . . 2 2 . . 4 3 . . 1 3 b y 2 3 b y 3 3 a x 4 ; run; data want; set have; by iden; constant = ifc(^first.iden and missing(constant), lag(constant), constant); run;
Ksharp
Hi ... correct, only worked for posted data, nice catch.
This is a variation on your posting plus coalescec from data _null_ (and Peter's "view") .
I think it works, but I'm willing to be corrected (again !!!).
data want(drop=_con) / view=want;
set have (where=(0)) have (rename=(constant=_con));
by iden;
constant = ifc(first.iden , _con , coalescec(_con,constant)) ;
run;
Hi.Mike. It is much better.
You guys are great!
Thanks a bunch.
Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.
Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.