Help using Base SAS procedures

How to add missing data

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 5
Accepted Solution

How to add missing data

Hi everyone, I have the following data

YearMonthDayHourVar
2011111A
2011111B
2011111C
2011111D
2011111E
2011112A
2011112B
2011112D
2011113C
2011113D
2011113E
2011114A
2011114B
2011114D
2011114E

where Var, if complete, should have data A, B, C, D, E at each hour.  However at present, only Hour 1 has the full set of data.  Hour 2 has only A, B, D.  Hour 3 has only C, D, E.  Etc.  I would like to add all the missing data in Var at each Hour for the whole Year 2011 (amd other Years too).  Is there a simple step for this process?  Thanks so much for your help!


Accepted Solutions
Solution
‎01-30-2012 04:51 PM
Super User
Posts: 5,081

Re: How to add missing data

Dan,

Here's a generic approach.

proc sql;

   create table list1 as select distinct year, month, day, hour from original;

   create table list2 as select distinct var from original;

   create table complete_shell as select * from list1, list2

      order by year, month, day, hour, var;

quit;

The advantage:  You don't need any prior knowledge about values of VAR.

The disadvantage:  It only takes values for VAR that actually appear in the data.  So if you want some VAR=B records, but there are no VAR=B records in the data, you won't get them.

If you want some other variables that are part of the original data, they would be easy to merge back in at this point.

Good luck.

View solution in original post


All Replies
Respected Advisor
Posts: 3,124

Re: How to add missing data

Hi,

It seems to me you would need a 2X DOW loop:

data have;

infile cards;

input Year     Month     Day     Hour     Var $;

cards;

2011     1     1     1     A

2011     1     1     1     B

2011     1     1     1     C

2011     1     1     1     D

2011     1     1     1     E

2011     1     1     2     A

2011     1     1     2     B

2011     1     1     2     D

2011     1     1     3     C

2011     1     1     3     D

2011     1     1     3     E

2011     1     1     4     A

2011     1     1     4     B

2011     1     1     4     D

2011     1     1     4     E

;

data want (drop=_Smiley Happy;

length _t $40;

retain _t;

do until (last.hour);

set have;

by year month day hour;

if first.hour then call missing (_t);

_t=cats(_t,var);

end;

do until (last.hour);

set have;

length _t $40;

by year month day hour;

output;

if last.hour then do _n_=1 to lengthn(compress('ABCDE',_t));

  var=substr(compress('ABCDE',_t),_n_,1);

  output;

end;

end;

run;

proc print;run;

Kindly Regards,

Haikuo

Occasional Contributor
Posts: 5

Re: How to add missing data

Thanks so much for your help Haikuo!  I will test it out later today!

-Dan

PROC Star
Posts: 7,363

Re: How to add missing data

Do you need to fill in missing years, months, days and hours as well and, in addition to that, do you also have fields that contain other data?

Occasional Contributor
Posts: 5

Re: How to add missing data

Hi Art297, 

Another process have filled in the missing days and hours already so no need to correct these. 

And there are fields that contain other data but they are not used in sorting or organizing the data set so I just ignore those.

-Dan

Respected Advisor
Posts: 3,777

Re: How to add missing data

data class;

   if 0 then set have(keep=var);

   input var @@;

   cards;

A B C D E F

;;;;

   run;

proc summary data=have nway classdata=class;

   by year--hour;

   class var;

   output out=expand;

   run;

proc print;

   run;

Occasional Contributor
Posts: 5

Re: How to add missing data

Thanks for your help data!   Your answer is the shortest and most elegant.  I will test it first! 

-Dan

Respected Advisor
Posts: 3,777

Re: How to add missing data

PROC SUMMARY is very powerful.:smileyinfo:

Regular Contributor
Posts: 168

Re: How to add missing data


From My understanding,

I used hai.kuo HAVE dataset,

data dummy ;

do var='A','B','C','D','E';

   do hours=1 to 4;

      do year=2011;

           output;

         end;

      end;

  end;

run;

proc sql;

create table temp as

select a.*,b.month,b.day,b.hour

from dummy a left join have b

on a.year=b.year and a.hours=b.hour and a.var=b.var

order by a.hours,a.var;

quit;

data want;

set temp;

array v(3) month hour day;

do i=1 to dim(v);

  if v(i)=. then v(i)=0;

end;

drop i;

run;

proc print;

run;

might be some one will give elegant answer

Thanks

Sam

New Contributor
Posts: 2

Re: How to add missing data

you'd have to sort the data first though before proc summary. Is there a way around that?
Solution
‎01-30-2012 04:51 PM
Super User
Posts: 5,081

Re: How to add missing data

Dan,

Here's a generic approach.

proc sql;

   create table list1 as select distinct year, month, day, hour from original;

   create table list2 as select distinct var from original;

   create table complete_shell as select * from list1, list2

      order by year, month, day, hour, var;

quit;

The advantage:  You don't need any prior knowledge about values of VAR.

The disadvantage:  It only takes values for VAR that actually appear in the data.  So if you want some VAR=B records, but there are no VAR=B records in the data, you won't get them.

If you want some other variables that are part of the original data, they would be easy to merge back in at this point.

Good luck.

Occasional Contributor
Posts: 5

Re: How to add missing data

I want to express profusely my thanks to everyone who replied with helpful solutions.  Orignially I had wanted to test all the methods but the number of solutions out-grew the amount of time I have to test each one.  Finally then I tried the method from Astounding because it looks short and simple, and it works!   Thanks Astounding!   And thanks again everyone!

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 11 replies
  • 527 views
  • 10 likes
  • 7 in conversation