Solved: How to add missing data

tintin · Posted 01-30-2012 02:08 PM

Hi everyone, I have the following data

Year	Month	Day	Hour	Var
2011	1	1	1	A
2011	1	1	1	B
2011	1	1	1	C
2011	1	1	1	D
2011	1	1	1	E
2011	1	1	2	A
2011	1	1	2	B
2011	1	1	2	D
2011	1	1	3	C
2011	1	1	3	D
2011	1	1	3	E
2011	1	1	4	A
2011	1	1	4	B
2011	1	1	4	D
2011	1	1	4	E

where Var, if complete, should have data A, B, C, D, E at each hour. However at present, only Hour 1 has the full set of data. Hour 2 has only A, B, D. Hour 3 has only C, D, E. Etc. I would like to add all the missing data in Var at each Hour for the whole Year 2011 (amd other Years too). Is there a simple step for this process? Thanks so much for your help!

Astounding · Posted 01-30-2012 04:51 PM

Dan,

Here's a generic approach.

proc sql;

create table list1 as select distinct year, month, day, hour from original;

create table list2 as select distinct var from original;

create table complete_shell as select * from list1, list2

order by year, month, day, hour, var;

quit;

The advantage: You don't need any prior knowledge about values of VAR.

The disadvantage: It only takes values for VAR that actually appear in the data. So if you want some VAR=B records, but there are no VAR=B records in the data, you won't get them.

If you want some other variables that are part of the original data, they would be easy to merge back in at this point.

Good luck.

View solution in original post

Haikuo · Posted 01-30-2012 02:37 PM

Hi,

It seems to me you would need a 2X DOW loop:

data have;

infile cards;

input Year Month Day Hour Var $;

cards;

2011 1 1 1 A

2011 1 1 1 B

2011 1 1 1 C

2011 1 1 1 D

2011 1 1 1 E

2011 1 1 2 A

2011 1 1 2 B

2011 1 1 2 D

2011 1 1 3 C

2011 1 1 3 D

2011 1 1 3 E

2011 1 1 4 A

2011 1 1 4 B

2011 1 1 4 D

2011 1 1 4 E

;

data want (drop=_:);

length _t $40;

retain _t;

do until (last.hour);

set have;

by year month day hour;

if first.hour then call missing (_t);

_t=cats(_t,var);

end;

do until (last.hour);

set have;

length _t $40;

by year month day hour;

output;

if last.hour then do _n_=1 to lengthn(compress('ABCDE',_t));

var=substr(compress('ABCDE',_t),_n_,1);

output;

end;

run;

proc print;run;

Kindly Regards,

Haikuo

tintin · Posted 01-30-2012 04:07 PM

Thanks so much for your help Haikuo! I will test it out later today!

-Dan

art297 · Posted 01-30-2012 02:56 PM

Do you need to fill in missing years, months, days and hours as well and, in addition to that, do you also have fields that contain other data?

tintin · Posted 01-30-2012 04:06 PM

Hi Art297,

Another process have filled in the missing days and hours already so no need to correct these.

And there are fields that contain other data but they are not used in sorting or organizing the data set so I just ignore those.

-Dan

data_null__ · Posted 01-30-2012 03:19 PM

data class;
   if 0 then set have(keep=var);
   input var @@;
   cards;
A B C D E F
;;;;
   run;
proc summary data=have nway classdata=class;
   by year--hour;
   class var;
   output out=expand;
   run;
proc print;
   run;

tintin · Posted 01-30-2012 04:09 PM

Thanks for your help data! Your answer is the shortest and most elegant. I will test it first!

-Dan

data_null__ · Posted 01-30-2012 04:12 PM

PROC SUMMARY is very powerful.:smileyinfo:

sam369 · Posted 01-30-2012 04:30 PM

From My understanding,

I used hai.kuo HAVE dataset,

data dummy ;

do var='A','B','C','D','E';

do hours=1 to 4;

do year=2011;

output;

end;

run;

proc sql;

create table temp as

select a.*,b.month,b.day,b.hour

from dummy a left join have b

on a.year=b.year and a.hours=b.hour and a.var=b.var

order by a.hours,a.var;

quit;

data want;

set temp;

array v(3) month hour day;

do i=1 to dim(v);

if v(i)=. then v(i)=0;

end;

drop i;

run;

proc print;

run;

might be some one will give elegant answer

Thanks

Sam

modsoul1 · Posted 05-17-2017 01:21 AM

you'd have to sort the data first though before proc summary. Is there a way around that?

Astounding · Posted 01-30-2012 04:51 PM

Dan,

Here's a generic approach.

proc sql;

create table list1 as select distinct year, month, day, hour from original;

create table list2 as select distinct var from original;

create table complete_shell as select * from list1, list2

order by year, month, day, hour, var;

quit;

The advantage: You don't need any prior knowledge about values of VAR.

The disadvantage: It only takes values for VAR that actually appear in the data. So if you want some VAR=B records, but there are no VAR=B records in the data, you won't get them.

If you want some other variables that are part of the original data, they would be easy to merge back in at this point.

Good luck.

tintin · Posted 01-30-2012 05:33 PM

I want to express profusely my thanks to everyone who replied with helpful solutions. Orignially I had wanted to test all the methods but the number of solutions out-grew the amount of time I have to test each one. Finally then I tried the method from Astounding because it looks short and simple, and it works! Thanks Astounding! And thanks again everyone!

How to add missing data

Re: How to add missing data

Re: How to add missing data

Re: How to add missing data

Re: How to add missing data

Re: How to add missing data

Re: How to add missing data

Re: How to add missing data

Re: How to add missing data

Re: How to add missing data

Re: How to add missing data

Re: How to add missing data

Re: How to add missing data

Registration is open

SAS Training: Just a Click Away