BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
sasphd
Lapis Lazuli | Level 10

 

 

 

Hello

from the table HAVE, I want table WANT. 

so I want to repeat the first observation available for an id to fill in the .

 

 

 

data HAVE;
input month_begin_dt :mmddyy10. id x :best32.;
format month_begin_dt date9.;
datalines;
1/1/2013 1 .
2/1/2013 1 .
3/1/2013 1 .
4/1/2013 1 0.15
5/1/2013 1 0.90
6/1/2013 1 0.34
7/1/2013 2 .
8/1/2013 2 .
9/1/2013 2 0.09
10/1/2013 2 0.28
11/1/2013 2 0.31
12/1/2013 2 0.79
;
run;


data WANT;
input month_begin_dt :mmddyy10. id x :best32.;
format month_begin_dt date9.;
datalines;
1/1/2013 1 0.15
2/1/2013 1 0.15
3/1/2013 1 0.15
4/1/2013 1 0.15
5/1/2013 1 0.90
6/1/2013 1 0.34
7/1/2013 2 0.09
8/1/2013 2 0.09
9/1/2013 2 0.09
10/1/2013 2 0.28
11/1/2013 2 0.31
12/1/2013 2 0.79
;
run;

1 ACCEPTED SOLUTION

Accepted Solutions
novinosrin
Tourmaline | Level 20
data HAVE;
input month_begin_dt :mmddyy10. id x :best32.;
format month_begin_dt date9.;
datalines;
1/1/2013 1 .
2/1/2013 1 .
3/1/2013 1 .
4/1/2013 1 0.15
5/1/2013 1 0.90
6/1/2013 1 0.34
7/1/2013 2 .
8/1/2013 2 .
9/1/2013 2 0.09
10/1/2013 2 0.28
11/1/2013 2 0.31
12/1/2013 2 0.79
;
run;

data _null_;
dcl hash H (ordered:'y') ;
   h.definekey  ("_n_") ;
   h.definedata ('month_begin_dt','id','x') ;
   h.definedone () ;
do _n_=nobs to 1 by -1;
set have point=_n_ nobs=nobs;
if not missing(x) then _x=x;
else x=_x;
h.add();
end;
h.output(dataset:'want');
stop;
run;

View solution in original post

16 REPLIES 16
novinosrin
Tourmaline | Level 20
data HAVE;
input month_begin_dt :mmddyy10. id x :best32.;
format month_begin_dt date9.;
datalines;
1/1/2013 1 .
2/1/2013 1 .
3/1/2013 1 .
4/1/2013 1 0.15
5/1/2013 1 0.90
6/1/2013 1 0.34
7/1/2013 2 .
8/1/2013 2 .
9/1/2013 2 0.09
10/1/2013 2 0.28
11/1/2013 2 0.31
12/1/2013 2 0.79
;
run;

data _null_;
dcl hash H (ordered:'y') ;
   h.definekey  ("_n_") ;
   h.definedata ('month_begin_dt','id','x') ;
   h.definedone () ;
do _n_=nobs to 1 by -1;
set have point=_n_ nobs=nobs;
if not missing(x) then _x=x;
else x=_x;
h.add();
end;
h.output(dataset:'want');
stop;
run;
HB
Barite | Level 11 HB
Barite | Level 11

This DOES NOT work:

 


proc sort data=have;
	by descending month_begin_dt id;
run;

data want;
	set have;
	holdid = lag(id);
	holdx = lag(x);
	if x = . and id = holdid then x = holdx;
run;

in that it only fills in one missing observation.

 

I don't understand @novinosrin 's solution cause I'm dumb and was trying for something cheap and easy. Is there a way to get something like that working?  Currently for the second observation with a missing x, it takes the original value of the x from the previous observation instead of the newly changed  x value. You can run it again on itself (data want2; set want etc) and it will fill in one more row but that is stupid.  Lol.    

novinosrin
Tourmaline | Level 20

@HB long time no see mate!, Hope you are well. No-one is dumb. Just interest and participation is what matters. If I could (aka a forklift driver turned BI person can do with no math  or any related academic background with spoiled childhood to even spoiled adulthood ) can learn do, anybody can. My parents stopped buying me video games, so i took to this. lol 

 

This should likely be your approach:

 

proc sort data=have out=_have;
	by descending month_begin_dt ;
run;

data temp;
	set _have;
	retain _x;
	if not missing(x) then _x=x;
	else x=_x;
	drop _x;
/*	holdid = lag(id);*/
/*	holdx = lag(x);*/
/*	if x = . and id = holdid then x = holdx;*/
run;

proc sort data=temp out=want;
by month_begin_dt;
run;
Haikuo
Onyx | Level 15

Sort first by id ,descending date, then do a retain, then sort it back to original order should work. The following data step is to do a literal fetch from first non-missing down the line, within the same id. The SQL coding is straightfoward, however, it may not be efficient for large files.

 

data HAVE;
 input month_begin_dt :mmddyy10. id x :best32.;
 format month_begin_dt date9.;
datalines;
1/1/2013 1 .
2/1/2013 1 .
3/1/2013 1 .
4/1/2013 1 0.15
5/1/2013 1 0.90
6/1/2013 1 0.34
7/1/2013 2 .
8/1/2013 2 .
9/1/2013 2 0.09
10/1/2013 2 0.28
11/1/2013 2 0.31
12/1/2013 2 0.79
;
run;

data want;
set have;
  if missing(x) then do;
    do i=_n_ by 1 until (not missing (_x) or _id ne id);
	  set have (keep=x id rename=(x=_x id=_id)) point=i;
	end;
	end;
	x=coalesce(x,_x);
	drop _:;
run;


proc sql;
create table want_sql as
select month_begin_dt ,id ,coalesce(x,(select x from have where id=a.id and not missing(x) group by id having min(month_begin_dt)=month_begin_dt)) as x
from have a
;
quit;
Haikuo
Onyx | Level 15

I like this approach, however, there are 2 things I will keep in mind when using it:

1. If all of the 'x' are missing for the same 'id', 'x' from other 'id' will be brought in, and I am not sure if that is the desired outcome for OP. This can be easily fixed by putting a 'id' check in place.

2. If the table are too big to fit in Ram.

 

 

novinosrin
Tourmaline | Level 20

@Haikuo yes sir, I had the same assumption indeed. I was so lazy and waiting for OP's feedback should the change be needed. Nice catch though!

sasphd
Lapis Lazuli | Level 10

Hello

from the table HAVE, I want table WANT. 

so I want to repeat the first observation available for an id to fill in the .

 

 

 

data HAVE;
input month_begin_dt :mmddyy10. id x :best32.;
format month_begin_dt date9.;
datalines;
1/1/2013 1 .
2/1/2013 1 .
3/1/2013 1 .
4/1/2013 1 0.15
5/1/2013 1 0.90
6/1/2013 1 0.34
7/1/2013 2 .
8/1/2013 2 .
9/1/2013 2 0.09
10/1/2013 2 0.28
11/1/2013 2 0.31
12/1/2013 2 0.79
;
run;


data WANT;
input month_begin_dt :mmddyy10. id x :best32.;
format month_begin_dt date9.;
datalines;
1/1/2013 1 0.15
2/1/2013 1 0.15
3/1/2013 1 0.15
4/1/2013 1 0.15
5/1/2013 1 0.90
6/1/2013 1 0.34
7/1/2013 2 0.09
8/1/2013 2 0.09
9/1/2013 2 0.09
10/1/2013 2 0.28
11/1/2013 2 0.31
12/1/2013 2 0.79
;
run;

 

 

I have asked this question. someone give me this solution however

data _null_;
dcl hash H (ordered:'y') ;
   h.definekey  ("_n_") ;
   h.definedata ('month_begin_dt','id','x') ;
   h.definedone () ;
do _n_=nobs to 1 by -1;
set have point=_n_ nobs=nobs;
if not missing(x) then _x=x;
else x=_x;
h.add();
end;
h.output(dataset:'want');
stop;
run;



 

If all of the 'x' are missing for the same 'id', 'x' from other 'id' will be brought in, and I am not sure if that is the desired outcome . 

novinosrin
Tourmaline | Level 20

Hi @sasphd 

 

data HAVE;
input month_begin_dt :mmddyy10. id x :best32.;
format month_begin_dt date9.;
datalines;
1/1/2013 1 .
2/1/2013 1 .
3/1/2013 1 .
4/1/2013 1 0.15
5/1/2013 1 0.90
6/1/2013 1 0.34
7/1/2013 2 .
8/1/2013 2 .
9/1/2013 2 0.09
10/1/2013 2 0.28
11/1/2013 2 0.31
12/1/2013 2 0.79
;
run;
data want;
if _n_=1 then do;
if 0 then set have;
dcl hash h(dataset:'have(where=(x ne .))');
h.definekey('id');
h.definedata('x');
h.definedone();
end;
do until(last.id);
update have(obs=0) have;
by id;
if missing(x) then rc=h.find();
output;
end;
drop rc ;
run;

Little fancy solution, test thoroughly and let us know if that doesn't work. We can jump to simple solution

sasphd
Lapis Lazuli | Level 10

if i want to set more than variable x and y and z 

how can I change your program

novinosrin
Tourmaline | Level 20

Hi @sasphd 

 

Very good question. Often the case whenever wide would warrant a transpose or a long form of sorts for a look up.

 


data want;
if _n_=1 then do;
if 0 then set have;
dcl hash h();
h.definekey('id','var');
h.definedata('id','var','val');
h.definedone();
do until(lr);
set have end=lr;
array t(*) x y z;
do i=1 to dim(t);
if not missing(t(i)) then do;
var=vname(t(i));
val=t(i);
rc=h.add();
end;
end;
end;
end;
do until(last.id);
update have(obs=0) have;
by id;
do i=1 to dim(t);
if missing(t(i)) then do;
var=vname(t(i));
rc=h.find();
t(i)=val;
end;
end;
output;
end;
keep id month_begin_dt x y z;
run;

I was lazy to make a sample, so didn't test it but the logic is pretty straight forward considering that requires an extra pass to load the hash with non missing groups in long format and look up using array indexes. If you could extend your time by modifying a full sample by adding x, y, z vars, I am more than happy and willing to test.

 

Best regards!

 

 

 

 

sasphd
Lapis Lazuli | Level 10

this does not work so I do it variable by variable with your first program

novinosrin
Tourmaline | Level 20

Can you please post a sample for me to test plz

sasphd
Lapis Lazuli | Level 10
Fund_NO caldt mret MER Turnover NAS
48 2007-12-31 -0,0308     222000
48 2008-01-31 -0,0467     331734
48 2008-02-29 -0,0445     353851
48 2008-03-31 0,0371     459456
48 2008-04-30 0,0265     518521
48 2008-05-31 -0,0048     549341
48 2008-06-30 -0,0752     510623
48 2008-07-31 -0,0074     518986
48 2008-08-31 0,0235     539814
48 2008-09-30 -0,1342     467712
48 2008-10-31 -0,1149     415673
48 2008-11-30 -0,0513     394714
48 2008-12-31 0,0226     405954
48 2009-01-31 -0,0776     374806
48 2009-02-28 -0,0687     349412
48 2009-03-31 0,0684     388964
48 2009-04-30 0,0931     448658
48 2009-05-31 0,0087     452920
48 2009-06-30 0,0414     508839
48 2009-07-31 0,0117     537095
48 2009-08-31 0,062     592043
48 2009-09-30 0,0211     601592
79 2007-12-31 0,0629      
79 2008-01-31 -0,1182 0,0267 1,1783  
79 2008-02-29 0,001 0,0267 1,1783  
79 2008-03-31 0,0537 0,0267 1,1783  
79 2008-04-30 -0,0222 0,0267 1,1783  
79 2008-05-31 0,0394 0,0267 1,1783  
79 2008-06-30 -0,0382 0,0267 1,1783  
79 2008-07-31 -0,0074 0,0267 1,1783  
79 2008-08-31 -0,033 0,0267 1,1783  
79 2008-09-30 -0,2411 0,0267 1,1783  
79 2008-10-31 -0,2527 0,0267 1,1783  
79 2008-11-30 -0,0515 0,0267 1,1783  
79 2008-12-31 0,0372 0,0267 1,1783  
79 2009-01-31 -0,0069 0,0268 0,3287  
79 2009-02-28 -0,0062 0,0268 0,3287  
79 2009-03-31 0,0823 0,0268 0,3287  
79 2009-04-30 0,1096 0,0268 0,3287  
79 2009-05-31 0,0368 0,0268 0,3287  
79 2009-06-30 0,054 0,0268 0,3287  
79 2009-07-31 -0,0076 0,0268 0,3287  
79 2009-08-31 -0,0185 0,0268 0,3287  
79 2009-09-30 0,0167 0,0268 0,3287  
79 2009-10-31 -0,0413 0,0268 0,3287  
79 2009-11-30 0,0191 0,0268 0,3287  
79 2009-12-31 0,0357 0,0268 0,3287  
79 2010-01-31 -0,0629 0,0272 0,0015  
79 2010-02-28 -0,0388 0,0272 0,0015  
79 2010-03-31 -0,0063 0,0272 0,0015  
79 2010-04-30 0,0045 0,0272 0,0015  
79 2010-05-31 -0,0615 0,0272 0,0015  
79 2010-06-30 -0,0449 0,0272 0,0015  
79 2010-07-31 0,0562 0,0272 0,0015  
79 2010-08-31 -0,0428 0,0272 0,0015  
79 2010-09-30 0,0115 0,0272 0,0015  
79 2010-10-31 0,0469 0,0272 0,0015  
79 2010-11-30 -0,0129 0,0272 0,0015  
79 2010-12-31 0,0254 0,0272 0,0015  
79 2011-01-31 0,0253 0,0274 0,0939  
79 2011-02-28 -0,0082 0,0274 0,0939  
79 2011-03-31 0,0008 0,0274 0,0939  
79 2011-04-30 -0,0164 0,0274 0,0939  
79 2011-05-31 -0,0183 0,0274 0,0939  
79 2011-06-30 -0,0398 0,0274 0,0939  
79 2011-07-31 -0,0091 0,0274 0,0939  
79 2011-08-31 -0,0398 0,0274 0,0939  
79 2011-09-30 -0,0609 0,0274 0,0939  
79 2011-10-31 0,0244 0,0274 0,0939  
79 2011-11-30 0,0078 0,0274 0,0939  
79 2011-12-31 -0,0119 0,0274 0,0939 2731000
79 2012-01-31 0,0048     2742000
80 2007-12-31 0,0607      
80 2008-01-31 -0,114 0,0266 0,9292  
80 2008-02-29 0,0022 0,0266 0,9292  
80 2008-03-31 0,055 0,0266 0,9292  
80 2008-04-30 -0,0224 0,0266 0,9292  
80 2008-05-31 0,0436 0,0266 0,9292  
80 2008-06-30 -0,0368 0,0266 0,9292  
80 2008-07-31 -0,0094 0,0266 0,9292  
80 2008-08-31 -0,033 0,0266 0,9292  
80 2008-09-30 -0,24 0,0266 0,9292  
novinosrin
Tourmaline | Level 20

Thank you @sasphd   Can you also post your expected output that can help to verify the test results plz

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 16 replies
  • 2975 views
  • 3 likes
  • 5 in conversation