Solved: repeating observations by id

sasphd · Posted 08-27-2018 11:42 AM

Hello

from the table HAVE, I want table WANT.

so I want to repeat the first observation available for an id to fill in the .

data HAVE;
input month_begin_dt :mmddyy10. id x :best32.;
format month_begin_dt date9.;
datalines;
1/1/2013 1 .
2/1/2013 1 .
3/1/2013 1 .
4/1/2013 1 0.15
5/1/2013 1 0.90
6/1/2013 1 0.34
7/1/2013 2 .
8/1/2013 2 .
9/1/2013 2 0.09
10/1/2013 2 0.28
11/1/2013 2 0.31
12/1/2013 2 0.79
;
run;

data WANT;
input month_begin_dt :mmddyy10. id x :best32.;
format month_begin_dt date9.;
datalines;
1/1/2013 1 0.15
2/1/2013 1 0.15
3/1/2013 1 0.15
4/1/2013 1 0.15
5/1/2013 1 0.90
6/1/2013 1 0.34
7/1/2013 2 0.09
8/1/2013 2 0.09
9/1/2013 2 0.09
10/1/2013 2 0.28
11/1/2013 2 0.31
12/1/2013 2 0.79
;
run;

novinosrin · Posted 08-27-2018 12:07 PM

data HAVE;
input month_begin_dt :mmddyy10. id x :best32.;
format month_begin_dt date9.;
datalines;
1/1/2013 1 .
2/1/2013 1 .
3/1/2013 1 .
4/1/2013 1 0.15
5/1/2013 1 0.90
6/1/2013 1 0.34
7/1/2013 2 .
8/1/2013 2 .
9/1/2013 2 0.09
10/1/2013 2 0.28
11/1/2013 2 0.31
12/1/2013 2 0.79
;
run;

data _null_;
dcl hash H (ordered:'y') ;
   h.definekey  ("_n_") ;
   h.definedata ('month_begin_dt','id','x') ;
   h.definedone () ;
do _n_=nobs to 1 by -1;
set have point=_n_ nobs=nobs;
if not missing(x) then _x=x;
else x=_x;
h.add();
end;
h.output(dataset:'want');
stop;
run;

View solution in original post

novinosrin · Posted 08-27-2018 12:07 PM

data HAVE;
input month_begin_dt :mmddyy10. id x :best32.;
format month_begin_dt date9.;
datalines;
1/1/2013 1 .
2/1/2013 1 .
3/1/2013 1 .
4/1/2013 1 0.15
5/1/2013 1 0.90
6/1/2013 1 0.34
7/1/2013 2 .
8/1/2013 2 .
9/1/2013 2 0.09
10/1/2013 2 0.28
11/1/2013 2 0.31
12/1/2013 2 0.79
;
run;

data _null_;
dcl hash H (ordered:'y') ;
   h.definekey  ("_n_") ;
   h.definedata ('month_begin_dt','id','x') ;
   h.definedone () ;
do _n_=nobs to 1 by -1;
set have point=_n_ nobs=nobs;
if not missing(x) then _x=x;
else x=_x;
h.add();
end;
h.output(dataset:'want');
stop;
run;

HB · Posted 08-27-2018 01:30 PM

This DOES NOT work:


proc sort data=have;
	by descending month_begin_dt id;
run;

data want;
	set have;
	holdid = lag(id);
	holdx = lag(x);
	if x = . and id = holdid then x = holdx;
run;

in that it only fills in one missing observation.

I don't understand @novinosrin 's solution cause I'm dumb and was trying for something cheap and easy. Is there a way to get something like that working? Currently for the second observation with a missing x, it takes the original value of the x from the previous observation instead of the newly changed x value. You can run it again on itself (data want2; set want etc) and it will fill in one more row but that is stupid. Lol.

novinosrin · Posted 08-27-2018 01:34 PM

@HB long time no see mate!, Hope you are well. No-one is dumb. Just interest and participation is what matters. If I could (aka a forklift driver turned BI person can do with no math or any related academic background with spoiled childhood to even spoiled adulthood ) can learn do, anybody can. My parents stopped buying me video games, so i took to this. lol

This should likely be your approach:

proc sort data=have out=_have;
	by descending month_begin_dt ;
run;

data temp;
	set _have;
	retain _x;
	if not missing(x) then _x=x;
	else x=_x;
	drop _x;
/*	holdid = lag(id);*/
/*	holdx = lag(x);*/
/*	if x = . and id = holdid then x = holdx;*/
run;

proc sort data=temp out=want;
by month_begin_dt;
run;

Haikuo · Posted 08-27-2018 01:41 PM

Sort first by id ,descending date, then do a retain, then sort it back to original order should work. The following data step is to do a literal fetch from first non-missing down the line, within the same id. The SQL coding is straightfoward, however, it may not be efficient for large files.

data HAVE;
 input month_begin_dt :mmddyy10. id x :best32.;
 format month_begin_dt date9.;
datalines;
1/1/2013 1 .
2/1/2013 1 .
3/1/2013 1 .
4/1/2013 1 0.15
5/1/2013 1 0.90
6/1/2013 1 0.34
7/1/2013 2 .
8/1/2013 2 .
9/1/2013 2 0.09
10/1/2013 2 0.28
11/1/2013 2 0.31
12/1/2013 2 0.79
;
run;

data want;
set have;
  if missing(x) then do;
    do i=_n_ by 1 until (not missing (_x) or _id ne id);
	  set have (keep=x id rename=(x=_x id=_id)) point=i;
	end;
	end;
	x=coalesce(x,_x);
	drop _:;
run;


proc sql;
create table want_sql as
select month_begin_dt ,id ,coalesce(x,(select x from have where id=a.id and not missing(x) group by id having min(month_begin_dt)=month_begin_dt)) as x
from have a
;
quit;

Haikuo · Posted 08-27-2018 01:50 PM

I like this approach, however, there are 2 things I will keep in mind when using it:

1. If all of the 'x' are missing for the same 'id', 'x' from other 'id' will be brought in, and I am not sure if that is the desired outcome for OP. This can be easily fixed by putting a 'id' check in place.

2. If the table are too big to fit in Ram.

novinosrin · Posted 08-27-2018 01:52 PM

@Haikuo yes sir, I had the same assumption indeed. I was so lazy and waiting for OP's feedback should the change be needed. Nice catch though!

sasphd · Posted 04-17-2019 03:44 PM

Hello

from the table HAVE, I want table WANT.

so I want to repeat the first observation available for an id to fill in the .

data HAVE;
input month_begin_dt :mmddyy10. id x :best32.;
format month_begin_dt date9.;
datalines;
1/1/2013 1 .
2/1/2013 1 .
3/1/2013 1 .
4/1/2013 1 0.15
5/1/2013 1 0.90
6/1/2013 1 0.34
7/1/2013 2 .
8/1/2013 2 .
9/1/2013 2 0.09
10/1/2013 2 0.28
11/1/2013 2 0.31
12/1/2013 2 0.79
;
run;

data WANT;
input month_begin_dt :mmddyy10. id x :best32.;
format month_begin_dt date9.;
datalines;
1/1/2013 1 0.15
2/1/2013 1 0.15
3/1/2013 1 0.15
4/1/2013 1 0.15
5/1/2013 1 0.90
6/1/2013 1 0.34
7/1/2013 2 0.09
8/1/2013 2 0.09
9/1/2013 2 0.09
10/1/2013 2 0.28
11/1/2013 2 0.31
12/1/2013 2 0.79
;
run;

I have asked this question. someone give me this solution however

data _null_;
dcl hash H (ordered:'y') ;
   h.definekey  ("_n_") ;
   h.definedata ('month_begin_dt','id','x') ;
   h.definedone () ;
do _n_=nobs to 1 by -1;
set have point=_n_ nobs=nobs;
if not missing(x) then _x=x;
else x=_x;
h.add();
end;
h.output(dataset:'want');
stop;
run;

If all of the 'x' are missing for the same 'id', 'x' from other 'id' will be brought in, and I am not sure if that is the desired outcome .

novinosrin · Posted 04-17-2019 03:50 PM

Hi @sasphd

data HAVE;
input month_begin_dt :mmddyy10. id x :best32.;
format month_begin_dt date9.;
datalines;
1/1/2013 1 .
2/1/2013 1 .
3/1/2013 1 .
4/1/2013 1 0.15
5/1/2013 1 0.90
6/1/2013 1 0.34
7/1/2013 2 .
8/1/2013 2 .
9/1/2013 2 0.09
10/1/2013 2 0.28
11/1/2013 2 0.31
12/1/2013 2 0.79
;
run;
data want;
if _n_=1 then do;
if 0 then set have;
dcl hash h(dataset:'have(where=(x ne .))');
h.definekey('id');
h.definedata('x');
h.definedone();
end;
do until(last.id);
update have(obs=0) have;
by id;
if missing(x) then rc=h.find();
output;
end;
drop rc ;
run;

Little fancy solution, test thoroughly and let us know if that doesn't work. We can jump to simple solution

sasphd · Posted 04-17-2019 04:17 PM

if i want to set more than variable x and y and z

how can I change your program

novinosrin · Posted 04-17-2019 04:48 PM

Hi @sasphd

Very good question. Often the case whenever wide would warrant a transpose or a long form of sorts for a look up.


data want;
if _n_=1 then do;
if 0 then set have;
dcl hash h();
h.definekey('id','var');
h.definedata('id','var','val');
h.definedone();
do until(lr);
set have end=lr;
array t(*) x y z;
do i=1 to dim(t);
if not missing(t(i)) then do;
var=vname(t(i));
val=t(i);
rc=h.add();
end;
end;
end;
end;
do until(last.id);
update have(obs=0) have;
by id;
do i=1 to dim(t);
if missing(t(i)) then do;
var=vname(t(i));
rc=h.find();
t(i)=val;
end;
end;
output;
end;
keep id month_begin_dt x y z;
run;

I was lazy to make a sample, so didn't test it but the logic is pretty straight forward considering that requires an extra pass to load the hash with non missing groups in long format and look up using array indexes. If you could extend your time by modifying a full sample by adding x, y, z vars, I am more than happy and willing to test.

Best regards!

sasphd · Posted 04-18-2019 09:59 AM

this does not work so I do it variable by variable with your first program

novinosrin · Posted 04-18-2019 10:00 AM

Can you please post a sample for me to test plz

sasphd · Posted 04-18-2019 10:02 AM

Fund_NO	caldt	mret	MER	Turnover	NAS
48	2007-12-31	-0,0308			222000
48	2008-01-31	-0,0467			331734
48	2008-02-29	-0,0445			353851
48	2008-03-31	0,0371			459456
48	2008-04-30	0,0265			518521
48	2008-05-31	-0,0048			549341
48	2008-06-30	-0,0752			510623
48	2008-07-31	-0,0074			518986
48	2008-08-31	0,0235			539814
48	2008-09-30	-0,1342			467712
48	2008-10-31	-0,1149			415673
48	2008-11-30	-0,0513			394714
48	2008-12-31	0,0226			405954
48	2009-01-31	-0,0776			374806
48	2009-02-28	-0,0687			349412
48	2009-03-31	0,0684			388964
48	2009-04-30	0,0931			448658
48	2009-05-31	0,0087			452920
48	2009-06-30	0,0414			508839
48	2009-07-31	0,0117			537095
48	2009-08-31	0,062			592043
48	2009-09-30	0,0211			601592
79	2007-12-31	0,0629
79	2008-01-31	-0,1182	0,0267	1,1783
79	2008-02-29	0,001	0,0267	1,1783
79	2008-03-31	0,0537	0,0267	1,1783
79	2008-04-30	-0,0222	0,0267	1,1783
79	2008-05-31	0,0394	0,0267	1,1783
79	2008-06-30	-0,0382	0,0267	1,1783
79	2008-07-31	-0,0074	0,0267	1,1783
79	2008-08-31	-0,033	0,0267	1,1783
79	2008-09-30	-0,2411	0,0267	1,1783
79	2008-10-31	-0,2527	0,0267	1,1783
79	2008-11-30	-0,0515	0,0267	1,1783
79	2008-12-31	0,0372	0,0267	1,1783
79	2009-01-31	-0,0069	0,0268	0,3287
79	2009-02-28	-0,0062	0,0268	0,3287
79	2009-03-31	0,0823	0,0268	0,3287
79	2009-04-30	0,1096	0,0268	0,3287
79	2009-05-31	0,0368	0,0268	0,3287
79	2009-06-30	0,054	0,0268	0,3287
79	2009-07-31	-0,0076	0,0268	0,3287
79	2009-08-31	-0,0185	0,0268	0,3287
79	2009-09-30	0,0167	0,0268	0,3287
79	2009-10-31	-0,0413	0,0268	0,3287
79	2009-11-30	0,0191	0,0268	0,3287
79	2009-12-31	0,0357	0,0268	0,3287
79	2010-01-31	-0,0629	0,0272	0,0015
79	2010-02-28	-0,0388	0,0272	0,0015
79	2010-03-31	-0,0063	0,0272	0,0015
79	2010-04-30	0,0045	0,0272	0,0015
79	2010-05-31	-0,0615	0,0272	0,0015
79	2010-06-30	-0,0449	0,0272	0,0015
79	2010-07-31	0,0562	0,0272	0,0015
79	2010-08-31	-0,0428	0,0272	0,0015
79	2010-09-30	0,0115	0,0272	0,0015
79	2010-10-31	0,0469	0,0272	0,0015
79	2010-11-30	-0,0129	0,0272	0,0015
79	2010-12-31	0,0254	0,0272	0,0015
79	2011-01-31	0,0253	0,0274	0,0939
79	2011-02-28	-0,0082	0,0274	0,0939
79	2011-03-31	0,0008	0,0274	0,0939
79	2011-04-30	-0,0164	0,0274	0,0939
79	2011-05-31	-0,0183	0,0274	0,0939
79	2011-06-30	-0,0398	0,0274	0,0939
79	2011-07-31	-0,0091	0,0274	0,0939
79	2011-08-31	-0,0398	0,0274	0,0939
79	2011-09-30	-0,0609	0,0274	0,0939
79	2011-10-31	0,0244	0,0274	0,0939
79	2011-11-30	0,0078	0,0274	0,0939
79	2011-12-31	-0,0119	0,0274	0,0939	2731000
79	2012-01-31	0,0048			2742000
80	2007-12-31	0,0607
80	2008-01-31	-0,114	0,0266	0,9292
80	2008-02-29	0,0022	0,0266	0,9292
80	2008-03-31	0,055	0,0266	0,9292
80	2008-04-30	-0,0224	0,0266	0,9292
80	2008-05-31	0,0436	0,0266	0,9292
80	2008-06-30	-0,0368	0,0266	0,9292
80	2008-07-31	-0,0094	0,0266	0,9292
80	2008-08-31	-0,033	0,0266	0,9292
80	2008-09-30	-0,24	0,0266	0,9292

novinosrin · Posted 04-18-2019 10:05 AM

Thank you @sasphd Can you also post your expected output that can help to verify the test results plz

repeat observation by id

Re: repeat observation by id

Re: repeat observation by id

Re: repeat observation by id

Re: repeat observation by id

Re: repeat observation by id

Re: repeat observation by id

Re: repeat observation by id

repeating observations by id

Re: repeating observations by id

Re: repeating observations by id

Re: repeating observations by id

Re: repeating observations by id

Re: repeating observations by id

Re: repeating observations by id

Re: repeating observations by id

Catch up on SAS Innovate 2026

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away