Hi everyone, I have the following data
Year | Month | Day | Hour | Var |
2011 | 1 | 1 | 1 | A |
2011 | 1 | 1 | 1 | B |
2011 | 1 | 1 | 1 | C |
2011 | 1 | 1 | 1 | D |
2011 | 1 | 1 | 1 | E |
2011 | 1 | 1 | 2 | A |
2011 | 1 | 1 | 2 | B |
2011 | 1 | 1 | 2 | D |
2011 | 1 | 1 | 3 | C |
2011 | 1 | 1 | 3 | D |
2011 | 1 | 1 | 3 | E |
2011 | 1 | 1 | 4 | A |
2011 | 1 | 1 | 4 | B |
2011 | 1 | 1 | 4 | D |
2011 | 1 | 1 | 4 | E |
where Var, if complete, should have data A, B, C, D, E at each hour. However at present, only Hour 1 has the full set of data. Hour 2 has only A, B, D. Hour 3 has only C, D, E. Etc. I would like to add all the missing data in Var at each Hour for the whole Year 2011 (amd other Years too). Is there a simple step for this process? Thanks so much for your help!
Dan,
Here's a generic approach.
proc sql;
create table list1 as select distinct year, month, day, hour from original;
create table list2 as select distinct var from original;
create table complete_shell as select * from list1, list2
order by year, month, day, hour, var;
quit;
The advantage: You don't need any prior knowledge about values of VAR.
The disadvantage: It only takes values for VAR that actually appear in the data. So if you want some VAR=B records, but there are no VAR=B records in the data, you won't get them.
If you want some other variables that are part of the original data, they would be easy to merge back in at this point.
Good luck.
Hi,
It seems to me you would need a 2X DOW loop:
data have;
infile cards;
input Year Month Day Hour Var $;
cards;
2011 1 1 1 A
2011 1 1 1 B
2011 1 1 1 C
2011 1 1 1 D
2011 1 1 1 E
2011 1 1 2 A
2011 1 1 2 B
2011 1 1 2 D
2011 1 1 3 C
2011 1 1 3 D
2011 1 1 3 E
2011 1 1 4 A
2011 1 1 4 B
2011 1 1 4 D
2011 1 1 4 E
;
data want (drop=_:);
length _t $40;
retain _t;
do until (last.hour);
set have;
by year month day hour;
if first.hour then call missing (_t);
_t=cats(_t,var);
end;
do until (last.hour);
set have;
length _t $40;
by year month day hour;
output;
if last.hour then do _n_=1 to lengthn(compress('ABCDE',_t));
var=substr(compress('ABCDE',_t),_n_,1);
output;
end;
end;
run;
proc print;run;
Kindly Regards,
Haikuo
Thanks so much for your help Haikuo! I will test it out later today!
-Dan
Do you need to fill in missing years, months, days and hours as well and, in addition to that, do you also have fields that contain other data?
Hi Art297,
Another process have filled in the missing days and hours already so no need to correct these.
And there are fields that contain other data but they are not used in sorting or organizing the data set so I just ignore those.
-Dan
data class;
if 0 then set have(keep=var);
input var @@;
cards;
A B C D E F
;;;;
run;
proc summary data=have nway classdata=class;
by year--hour;
class var;
output out=expand;
run;
proc print;
run;
Thanks for your help data! Your answer is the shortest and most elegant. I will test it first!
-Dan
PROC SUMMARY is very powerful.:smileyinfo:
From My understanding,
I used hai.kuo HAVE dataset,
data dummy ;
do var='A','B','C','D','E';
do hours=1 to 4;
do year=2011;
output;
end;
end;
end;
run;
proc sql;
create table temp as
select a.*,b.month,b.day,b.hour
from dummy a left join have b
on a.year=b.year and a.hours=b.hour and a.var=b.var
order by a.hours,a.var;
quit;
data want;
set temp;
array v(3) month hour day;
do i=1 to dim(v);
if v(i)=. then v(i)=0;
end;
drop i;
run;
proc print;
run;
might be some one will give elegant answer
Thanks
Sam
Dan,
Here's a generic approach.
proc sql;
create table list1 as select distinct year, month, day, hour from original;
create table list2 as select distinct var from original;
create table complete_shell as select * from list1, list2
order by year, month, day, hour, var;
quit;
The advantage: You don't need any prior knowledge about values of VAR.
The disadvantage: It only takes values for VAR that actually appear in the data. So if you want some VAR=B records, but there are no VAR=B records in the data, you won't get them.
If you want some other variables that are part of the original data, they would be easy to merge back in at this point.
Good luck.
I want to express profusely my thanks to everyone who replied with helpful solutions. Orignially I had wanted to test all the methods but the number of solutions out-grew the amount of time I have to test each one. Finally then I tried the method from Astounding because it looks short and simple, and it works! Thanks Astounding! And thanks again everyone!
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9.
Early bird rate extended! Save $200 when you sign up by March 31.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.