Hello
from the table HAVE, I want table WANT.
so I want to repeat the first observation available for an id to fill in the .
data HAVE;
input month_begin_dt :mmddyy10. id x :best32.;
format month_begin_dt date9.;
datalines;
1/1/2013 1 .
2/1/2013 1 .
3/1/2013 1 .
4/1/2013 1 0.15
5/1/2013 1 0.90
6/1/2013 1 0.34
7/1/2013 2 .
8/1/2013 2 .
9/1/2013 2 0.09
10/1/2013 2 0.28
11/1/2013 2 0.31
12/1/2013 2 0.79
;
run;
data WANT;
input month_begin_dt :mmddyy10. id x :best32.;
format month_begin_dt date9.;
datalines;
1/1/2013 1 0.15
2/1/2013 1 0.15
3/1/2013 1 0.15
4/1/2013 1 0.15
5/1/2013 1 0.90
6/1/2013 1 0.34
7/1/2013 2 0.09
8/1/2013 2 0.09
9/1/2013 2 0.09
10/1/2013 2 0.28
11/1/2013 2 0.31
12/1/2013 2 0.79
;
run;
data HAVE;
input month_begin_dt :mmddyy10. id x :best32.;
format month_begin_dt date9.;
datalines;
1/1/2013 1 .
2/1/2013 1 .
3/1/2013 1 .
4/1/2013 1 0.15
5/1/2013 1 0.90
6/1/2013 1 0.34
7/1/2013 2 .
8/1/2013 2 .
9/1/2013 2 0.09
10/1/2013 2 0.28
11/1/2013 2 0.31
12/1/2013 2 0.79
;
run;
data _null_;
dcl hash H (ordered:'y') ;
h.definekey ("_n_") ;
h.definedata ('month_begin_dt','id','x') ;
h.definedone () ;
do _n_=nobs to 1 by -1;
set have point=_n_ nobs=nobs;
if not missing(x) then _x=x;
else x=_x;
h.add();
end;
h.output(dataset:'want');
stop;
run;
data HAVE;
input month_begin_dt :mmddyy10. id x :best32.;
format month_begin_dt date9.;
datalines;
1/1/2013 1 .
2/1/2013 1 .
3/1/2013 1 .
4/1/2013 1 0.15
5/1/2013 1 0.90
6/1/2013 1 0.34
7/1/2013 2 .
8/1/2013 2 .
9/1/2013 2 0.09
10/1/2013 2 0.28
11/1/2013 2 0.31
12/1/2013 2 0.79
;
run;
data _null_;
dcl hash H (ordered:'y') ;
h.definekey ("_n_") ;
h.definedata ('month_begin_dt','id','x') ;
h.definedone () ;
do _n_=nobs to 1 by -1;
set have point=_n_ nobs=nobs;
if not missing(x) then _x=x;
else x=_x;
h.add();
end;
h.output(dataset:'want');
stop;
run;
This DOES NOT work:
proc sort data=have;
by descending month_begin_dt id;
run;
data want;
set have;
holdid = lag(id);
holdx = lag(x);
if x = . and id = holdid then x = holdx;
run;
in that it only fills in one missing observation.
I don't understand @novinosrin 's solution cause I'm dumb and was trying for something cheap and easy. Is there a way to get something like that working? Currently for the second observation with a missing x, it takes the original value of the x from the previous observation instead of the newly changed x value. You can run it again on itself (data want2; set want etc) and it will fill in one more row but that is stupid. Lol.
@HB long time no see mate!, Hope you are well. No-one is dumb. Just interest and participation is what matters. If I could (aka a forklift driver turned BI person can do with no math or any related academic background with spoiled childhood to even spoiled adulthood ) can learn do, anybody can. My parents stopped buying me video games, so i took to this. lol
This should likely be your approach:
proc sort data=have out=_have;
by descending month_begin_dt ;
run;
data temp;
set _have;
retain _x;
if not missing(x) then _x=x;
else x=_x;
drop _x;
/* holdid = lag(id);*/
/* holdx = lag(x);*/
/* if x = . and id = holdid then x = holdx;*/
run;
proc sort data=temp out=want;
by month_begin_dt;
run;
Sort first by id ,descending date, then do a retain, then sort it back to original order should work. The following data step is to do a literal fetch from first non-missing down the line, within the same id. The SQL coding is straightfoward, however, it may not be efficient for large files.
data HAVE;
input month_begin_dt :mmddyy10. id x :best32.;
format month_begin_dt date9.;
datalines;
1/1/2013 1 .
2/1/2013 1 .
3/1/2013 1 .
4/1/2013 1 0.15
5/1/2013 1 0.90
6/1/2013 1 0.34
7/1/2013 2 .
8/1/2013 2 .
9/1/2013 2 0.09
10/1/2013 2 0.28
11/1/2013 2 0.31
12/1/2013 2 0.79
;
run;
data want;
set have;
if missing(x) then do;
do i=_n_ by 1 until (not missing (_x) or _id ne id);
set have (keep=x id rename=(x=_x id=_id)) point=i;
end;
end;
x=coalesce(x,_x);
drop _:;
run;
proc sql;
create table want_sql as
select month_begin_dt ,id ,coalesce(x,(select x from have where id=a.id and not missing(x) group by id having min(month_begin_dt)=month_begin_dt)) as x
from have a
;
quit;
I like this approach, however, there are 2 things I will keep in mind when using it:
1. If all of the 'x' are missing for the same 'id', 'x' from other 'id' will be brought in, and I am not sure if that is the desired outcome for OP. This can be easily fixed by putting a 'id' check in place.
2. If the table are too big to fit in Ram.
@Haikuo yes sir, I had the same assumption indeed. I was so lazy and waiting for OP's feedback should the change be needed. Nice catch though!
Hello
from the table HAVE, I want table WANT.
so I want to repeat the first observation available for an id to fill in the .
data HAVE;
input month_begin_dt :mmddyy10. id x :best32.;
format month_begin_dt date9.;
datalines;
1/1/2013 1 .
2/1/2013 1 .
3/1/2013 1 .
4/1/2013 1 0.15
5/1/2013 1 0.90
6/1/2013 1 0.34
7/1/2013 2 .
8/1/2013 2 .
9/1/2013 2 0.09
10/1/2013 2 0.28
11/1/2013 2 0.31
12/1/2013 2 0.79
;
run;
data WANT;
input month_begin_dt :mmddyy10. id x :best32.;
format month_begin_dt date9.;
datalines;
1/1/2013 1 0.15
2/1/2013 1 0.15
3/1/2013 1 0.15
4/1/2013 1 0.15
5/1/2013 1 0.90
6/1/2013 1 0.34
7/1/2013 2 0.09
8/1/2013 2 0.09
9/1/2013 2 0.09
10/1/2013 2 0.28
11/1/2013 2 0.31
12/1/2013 2 0.79
;
run;
I have asked this question. someone give me this solution however
data _null_; dcl hash H (ordered:'y') ; h.definekey ("_n_") ; h.definedata ('month_begin_dt','id','x') ; h.definedone () ; do _n_=nobs to 1 by -1; set have point=_n_ nobs=nobs; if not missing(x) then _x=x; else x=_x; h.add(); end; h.output(dataset:'want'); stop; run;
If all of the 'x' are missing for the same 'id', 'x' from other 'id' will be brought in, and I am not sure if that is the desired outcome .
Hi @sasphd
data HAVE;
input month_begin_dt :mmddyy10. id x :best32.;
format month_begin_dt date9.;
datalines;
1/1/2013 1 .
2/1/2013 1 .
3/1/2013 1 .
4/1/2013 1 0.15
5/1/2013 1 0.90
6/1/2013 1 0.34
7/1/2013 2 .
8/1/2013 2 .
9/1/2013 2 0.09
10/1/2013 2 0.28
11/1/2013 2 0.31
12/1/2013 2 0.79
;
run;
data want;
if _n_=1 then do;
if 0 then set have;
dcl hash h(dataset:'have(where=(x ne .))');
h.definekey('id');
h.definedata('x');
h.definedone();
end;
do until(last.id);
update have(obs=0) have;
by id;
if missing(x) then rc=h.find();
output;
end;
drop rc ;
run;
Little fancy solution, test thoroughly and let us know if that doesn't work. We can jump to simple solution
if i want to set more than variable x and y and z
how can I change your program
Hi @sasphd
Very good question. Often the case whenever wide would warrant a transpose or a long form of sorts for a look up.
data want;
if _n_=1 then do;
if 0 then set have;
dcl hash h();
h.definekey('id','var');
h.definedata('id','var','val');
h.definedone();
do until(lr);
set have end=lr;
array t(*) x y z;
do i=1 to dim(t);
if not missing(t(i)) then do;
var=vname(t(i));
val=t(i);
rc=h.add();
end;
end;
end;
end;
do until(last.id);
update have(obs=0) have;
by id;
do i=1 to dim(t);
if missing(t(i)) then do;
var=vname(t(i));
rc=h.find();
t(i)=val;
end;
end;
output;
end;
keep id month_begin_dt x y z;
run;
I was lazy to make a sample, so didn't test it but the logic is pretty straight forward considering that requires an extra pass to load the hash with non missing groups in long format and look up using array indexes. If you could extend your time by modifying a full sample by adding x, y, z vars, I am more than happy and willing to test.
Best regards!
this does not work so I do it variable by variable with your first program
Can you please post a sample for me to test plz
Fund_NO | caldt | mret | MER | Turnover | NAS |
48 | 2007-12-31 | -0,0308 | 222000 | ||
48 | 2008-01-31 | -0,0467 | 331734 | ||
48 | 2008-02-29 | -0,0445 | 353851 | ||
48 | 2008-03-31 | 0,0371 | 459456 | ||
48 | 2008-04-30 | 0,0265 | 518521 | ||
48 | 2008-05-31 | -0,0048 | 549341 | ||
48 | 2008-06-30 | -0,0752 | 510623 | ||
48 | 2008-07-31 | -0,0074 | 518986 | ||
48 | 2008-08-31 | 0,0235 | 539814 | ||
48 | 2008-09-30 | -0,1342 | 467712 | ||
48 | 2008-10-31 | -0,1149 | 415673 | ||
48 | 2008-11-30 | -0,0513 | 394714 | ||
48 | 2008-12-31 | 0,0226 | 405954 | ||
48 | 2009-01-31 | -0,0776 | 374806 | ||
48 | 2009-02-28 | -0,0687 | 349412 | ||
48 | 2009-03-31 | 0,0684 | 388964 | ||
48 | 2009-04-30 | 0,0931 | 448658 | ||
48 | 2009-05-31 | 0,0087 | 452920 | ||
48 | 2009-06-30 | 0,0414 | 508839 | ||
48 | 2009-07-31 | 0,0117 | 537095 | ||
48 | 2009-08-31 | 0,062 | 592043 | ||
48 | 2009-09-30 | 0,0211 | 601592 | ||
79 | 2007-12-31 | 0,0629 | |||
79 | 2008-01-31 | -0,1182 | 0,0267 | 1,1783 | |
79 | 2008-02-29 | 0,001 | 0,0267 | 1,1783 | |
79 | 2008-03-31 | 0,0537 | 0,0267 | 1,1783 | |
79 | 2008-04-30 | -0,0222 | 0,0267 | 1,1783 | |
79 | 2008-05-31 | 0,0394 | 0,0267 | 1,1783 | |
79 | 2008-06-30 | -0,0382 | 0,0267 | 1,1783 | |
79 | 2008-07-31 | -0,0074 | 0,0267 | 1,1783 | |
79 | 2008-08-31 | -0,033 | 0,0267 | 1,1783 | |
79 | 2008-09-30 | -0,2411 | 0,0267 | 1,1783 | |
79 | 2008-10-31 | -0,2527 | 0,0267 | 1,1783 | |
79 | 2008-11-30 | -0,0515 | 0,0267 | 1,1783 | |
79 | 2008-12-31 | 0,0372 | 0,0267 | 1,1783 | |
79 | 2009-01-31 | -0,0069 | 0,0268 | 0,3287 | |
79 | 2009-02-28 | -0,0062 | 0,0268 | 0,3287 | |
79 | 2009-03-31 | 0,0823 | 0,0268 | 0,3287 | |
79 | 2009-04-30 | 0,1096 | 0,0268 | 0,3287 | |
79 | 2009-05-31 | 0,0368 | 0,0268 | 0,3287 | |
79 | 2009-06-30 | 0,054 | 0,0268 | 0,3287 | |
79 | 2009-07-31 | -0,0076 | 0,0268 | 0,3287 | |
79 | 2009-08-31 | -0,0185 | 0,0268 | 0,3287 | |
79 | 2009-09-30 | 0,0167 | 0,0268 | 0,3287 | |
79 | 2009-10-31 | -0,0413 | 0,0268 | 0,3287 | |
79 | 2009-11-30 | 0,0191 | 0,0268 | 0,3287 | |
79 | 2009-12-31 | 0,0357 | 0,0268 | 0,3287 | |
79 | 2010-01-31 | -0,0629 | 0,0272 | 0,0015 | |
79 | 2010-02-28 | -0,0388 | 0,0272 | 0,0015 | |
79 | 2010-03-31 | -0,0063 | 0,0272 | 0,0015 | |
79 | 2010-04-30 | 0,0045 | 0,0272 | 0,0015 | |
79 | 2010-05-31 | -0,0615 | 0,0272 | 0,0015 | |
79 | 2010-06-30 | -0,0449 | 0,0272 | 0,0015 | |
79 | 2010-07-31 | 0,0562 | 0,0272 | 0,0015 | |
79 | 2010-08-31 | -0,0428 | 0,0272 | 0,0015 | |
79 | 2010-09-30 | 0,0115 | 0,0272 | 0,0015 | |
79 | 2010-10-31 | 0,0469 | 0,0272 | 0,0015 | |
79 | 2010-11-30 | -0,0129 | 0,0272 | 0,0015 | |
79 | 2010-12-31 | 0,0254 | 0,0272 | 0,0015 | |
79 | 2011-01-31 | 0,0253 | 0,0274 | 0,0939 | |
79 | 2011-02-28 | -0,0082 | 0,0274 | 0,0939 | |
79 | 2011-03-31 | 0,0008 | 0,0274 | 0,0939 | |
79 | 2011-04-30 | -0,0164 | 0,0274 | 0,0939 | |
79 | 2011-05-31 | -0,0183 | 0,0274 | 0,0939 | |
79 | 2011-06-30 | -0,0398 | 0,0274 | 0,0939 | |
79 | 2011-07-31 | -0,0091 | 0,0274 | 0,0939 | |
79 | 2011-08-31 | -0,0398 | 0,0274 | 0,0939 | |
79 | 2011-09-30 | -0,0609 | 0,0274 | 0,0939 | |
79 | 2011-10-31 | 0,0244 | 0,0274 | 0,0939 | |
79 | 2011-11-30 | 0,0078 | 0,0274 | 0,0939 | |
79 | 2011-12-31 | -0,0119 | 0,0274 | 0,0939 | 2731000 |
79 | 2012-01-31 | 0,0048 | 2742000 | ||
80 | 2007-12-31 | 0,0607 | |||
80 | 2008-01-31 | -0,114 | 0,0266 | 0,9292 | |
80 | 2008-02-29 | 0,0022 | 0,0266 | 0,9292 | |
80 | 2008-03-31 | 0,055 | 0,0266 | 0,9292 | |
80 | 2008-04-30 | -0,0224 | 0,0266 | 0,9292 | |
80 | 2008-05-31 | 0,0436 | 0,0266 | 0,9292 | |
80 | 2008-06-30 | -0,0368 | 0,0266 | 0,9292 | |
80 | 2008-07-31 | -0,0094 | 0,0266 | 0,9292 | |
80 | 2008-08-31 | -0,033 | 0,0266 | 0,9292 | |
80 | 2008-09-30 | -0,24 | 0,0266 | 0,9292 |
Thank you @sasphd Can you also post your expected output that can help to verify the test results plz
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.