## Combining and filling in gaps in datasets

Solved
Occasional Contributor
Posts: 9

# Combining and filling in gaps in datasets

I currently have two datasets. In each dataset, observations are organized by two variables: date (in the form of day/month) and a 3-digit alphanumeric code. For each date/code combo there is a count (usually either 0 or 1). There are 7 days in the range (say Jul. 1-7). One dataset has observations from 2014, and the other dataset has a combined average observation from 2012 and 2013, but I have removed the year from the date so that I will be able to merge the datasets. Some codes have no counts for a given day in one or the other of the datasets because that code simply wasn't used on a given day; and some of the possible codes (of which there are 516, contained in a finite list in another spreadsheet) were not used in one or the other of the datasets at all. When a code from the master list wasn't originally used in that 7-day period, I used proc sql to insert those missing codes into the dataset (thanks to help provided in a previous thread here). For such codes, there is only one observation with a missing (.) date and a count of 0. What I want to do is: A) combine the two datasets, by code and date; and B) somehow fill in missing days so that each code has an observation for each of the 7 days (Jul. 1-7), with the count being 0 if there was originally no observation.

An example of the current structure of Dataset 1 is:

Code   Date       Observed Count (2014)

1X1     07/01         1

1X1     07/03         1

1X1     07/06         0

2Y3     .                 0

3J8     07/02         1

3J8     07/03         0

An example of the current structure of Dataset 2 is:

Code   Date       Observed Count (2013/2012 Avg)

1X1     07/01         0

1X1     07/02         1

1X1     07/04         0

2Y3     07/01         1

3J8     .                  0

What I would like the final dataset to look like is:

Code   Date       Observed Count (2014)    Observed Count (2013/2012 Avg)

1X1     07/01         1                                      0

1X1     07/02         0                                      1

1X1     07/03         1                                      0

1X1     07/04         0                                      0

1X1     07/05         0                                      0

1X1     07/06         0                                      0

1X1     07/07         0                                      0

2Y3     07/01         0                                      1

2Y3     07/02         0                                      0

2Y3    07/03          0                                      0

2Y3     07/04         0                                      0

2Y3    07/05         0                                       0

2Y3     07/06         0                                      0

2Y3     07/07         0                                      0

3J8     07/01          0                                      0

3J8     07/02         1                                       0

3J8     07/03         0                                       0

3J8     07/04         0                                       0

3J8     07/05         0                                       0

3J8     07/06         0                                       0

3J8     07/07         0                                       0

I am not sure how to go about combining and filling in gaps in the datasets in this manner. I did find a somewhat similar example using PROC EXPAND online, but wasn't sure how to adapt it to this scenario. Any guidance on how to achieve this task would be sincerely appreciated. Thank you!

Accepted Solutions
Solution
‎01-26-2015 06:05 AM
Super Contributor
Posts: 324

## Re: Combining and filling in gaps in datasets

When ARRAY can be used, HASH Object too can be used.

Here is Hash version.

BY GROUP processing with CLEAR() would be far easier as done with the ARRAY. No need to have BY GROUP processing. No need to change the date to number. Both Data Sets are read sequentially and the hash table is built.

data one;

input code \$3. @9 date \$5. count;

datalines;

1X1     07/01         1

1X1     07/03         1

1X1     07/06         0

2Y3     .             0

3J8     07/02         1

3J8     07/03         0

;

run;

data two;

input code \$3. @9 date \$5. count;

datalines;

1X1     07/01         0

1X1     07/02         1

1X1     07/04         0

2Y3     07/01         1

3J8     .             0

;

run;

data want;

if _n_ = 1 then do;

if 0 then set one;

declare hash h(ordered:'a');

h.definekey('code','date');

h.definedata('code', 'date', 'count_14', 'count_13');

h.definedone();

declare hash h1(ordered:'y');

h1.definekey('code');

h1.definedone();

declare hiter hi('h1');

end;

do until(last);

set one(in = a) two(in = b) end = last;

if h.find() ^= 0 then do; count_14 = 0;  count_13 = 0; end;

if a then count_14 = count;

if b then count_13 = count;

if not missing(date) then h.replace();

if h1.find() ^= 0 then h1.add();

end;

do while(hi.next() = 0);

do Date = '07/01','07/02','07/03','07/04','07/05','07/06','07/07';

rc = h.find();

if rc ^= 0 then do;

count_14 = 0;

count_13 = 0;

end;

output;

end;

end;

drop rc count;

run;

All Replies
Super User
Posts: 23,685

## Re: Combining and filling in gaps in datasets

This isn't the usual way I'd recommend but its a way, if you have SAS ETS licensed.

How I usually recommend doing it, no SAS ETS required:

Super Contributor
Posts: 324

## Re: Combining and filling in gaps in datasets

No need for ETS. Simple data step is sufficient to get an elegant solution.

Change Date into  a Number by using the MDY() function. Use one array for 2014(say, k_14[ ])  and anther array for average of 2012 & 13 (say, k_13[ ]). In double DoW-loops, read observations from the two data sets by BY GROUP (by CODE). Then merge the two arrays into one and output the results.

data one(drop=date dd mm);

input code \$3. @9 date \$5. count;

mm = substr(date,1,2);

dd = substr(date, 4);

ndate = mdy(mm,dd, 2014);

datalines;

1X1     07/01         1

1X1     07/03         1

1X1     07/06         0

2Y3     .             0

3J8     07/02         1

3J8     07/03         0

;

run;

data two(drop=date dd mm);

input code \$3. @9 date \$5. count;

mm = substr(date,1,2);

dd = substr(date, 4);

ndate = mdy(mm,dd, 2014);

datalines;

1X1     07/01         0

1X1     07/02         1

1X1     07/04         0

2Y3     07/01         1

3J8     .             0

;

run;

data want(rename = (i = Date));

array k_14[19905:19911] _temporary_;

array k_13[19905:19911] _temporary_;

do until(last.code);

set one;

by code;

if not missing(ndate) then k_14[ndate] = count;

end;

do until(last.code);

set two;

by code;

if not missing(ndate) then k_13[ndate] = count;

end;

do i = lbound(k_14) to hbound(k_14);

count_14 = ifN(k_14 > 0, k_14, 0);

count_13 = ifN(k_13 > 0, k_13, 0);

output;

end;

call missing(of k_14

• , of k_13
• );
• keep code i count_14 count_13;

format i date5.;

run;

Super User
Posts: 10,770

## Re: Combining and filling in gaps in datasets

```data one;
input code \$3. @9 date \$5. count;
datalines;
1X1     07/01         1
1X1     07/03         1
1X1     07/06         0
2Y3     .             0
3J8     07/02         1
3J8     07/03         0
;
run;

data two;
input code \$3. @9 date \$5. count;
datalines;
1X1     07/01         0
1X1     07/02         1
1X1     07/04         0
2Y3     07/01         1
3J8     .             0
;
run;
data date;
do i='01jul2014'd to '07jul2014'd;
date=put(i,mmddyy5.);output;
end;
drop i;
run;
proc sql;
create table want as
select x.*,count_2013_2012_avg from
(
select a.*,case when missing(b.count) then 0 else b.count end as count_2014  from
(select * from (select distinct code from one),date) as a left join one as b
on a.code=b.code and a.date=b.date
) as x
,
(
select a1.*,case when missing(b1.count) then 0 else b1.count end as count_2013_2012_avg  from
(select * from (select distinct code from two),date) as a1 left join two as b1
on a1.code=b1.code and a1.date=b1.date
) as y
where x.code=y.code and x.date=y.date
order by 1,2;

quit;

```

Xia Keshan

Solution
‎01-26-2015 06:05 AM
Super Contributor
Posts: 324

## Re: Combining and filling in gaps in datasets

When ARRAY can be used, HASH Object too can be used.

Here is Hash version.

BY GROUP processing with CLEAR() would be far easier as done with the ARRAY. No need to have BY GROUP processing. No need to change the date to number. Both Data Sets are read sequentially and the hash table is built.

data one;

input code \$3. @9 date \$5. count;

datalines;

1X1     07/01         1

1X1     07/03         1

1X1     07/06         0

2Y3     .             0

3J8     07/02         1

3J8     07/03         0

;

run;

data two;

input code \$3. @9 date \$5. count;

datalines;

1X1     07/01         0

1X1     07/02         1

1X1     07/04         0

2Y3     07/01         1

3J8     .             0

;

run;

data want;

if _n_ = 1 then do;

if 0 then set one;

declare hash h(ordered:'a');

h.definekey('code','date');

h.definedata('code', 'date', 'count_14', 'count_13');

h.definedone();

declare hash h1(ordered:'y');

h1.definekey('code');

h1.definedone();

declare hiter hi('h1');

end;

do until(last);

set one(in = a) two(in = b) end = last;

if h.find() ^= 0 then do; count_14 = 0;  count_13 = 0; end;

if a then count_14 = count;

if b then count_13 = count;

if not missing(date) then h.replace();

if h1.find() ^= 0 then h1.add();

end;

do while(hi.next() = 0);

do Date = '07/01','07/02','07/03','07/04','07/05','07/06','07/07';

rc = h.find();

if rc ^= 0 then do;

count_14 = 0;

count_13 = 0;

end;

output;

end;

end;

drop rc count;

run;

Super Contributor
Posts: 319

## Re: Combining and filling in gaps in datasets

Hello,

You can first fill-in the missing series and afterwards concatenate the datasets:

data dataset1 (drop=date);
input Code \$ Date \$ Count_2014;
if missing(Date)=0 then
date_num=mdy(input(substr(date,1,2),3.),input(substr(date,4,2),3.),2014);

format date_num date7.;
datalines;
1X1 07/01 1
1X1 07/03 1
1X1 07/06 0
2Y3 . 0
3J8 07/02 1
3J8 07/03 0
4J8 07/02 1
4J8 07/07 0
;
run;
data dataset2 (drop=date);
input Code \$ Date \$ Count_2012_2013;
if missing(Date)=0 then
date_num=mdy(input(substr(date,1,2),3.),input(substr(date,4,2),3.),2014);

format date_num date7.;
datalines;
1X1 07/01 0
1X1 07/02 1
1X1 07/04 0
2Y3 07/01 1
3J8 . 0
4J8 07/07 0
;
run;

data have1;

format date date7.;
set dataset1;
by Code;

retain date;

if first.Code then date="01JUL2014"d;

do i=date to "07JUL2014"d while (date_num ne date);
Observed_Count_2014=0;
output;
date+1;
;
end;

if date_num=date then do;Observed_Count_2014=count_2014;output;date+1;end;

if last.Code and date_num not in(.,"07JUL2014"d)  and date le "07JUL2014"d then
do;
do until(date="08JUL2014"d) ;
Observed_Count_2014=0;
output;
date+1;
end;
end;

keep code date Observed_Count_2014;
run;

data have2;

format date date7.;
set dataset2;
by Code;

retain date;

if first.Code then date="01JUL2014"d;

do i=date to "07JUL2014"d while (date_num ne date);
Observed_Count_2012_2013=0;
output;
date+1;
;
end;

if date_num=date then do;Observed_Count_2012_2013=count_2012_2013;output;date+1;end;

if last.Code and date_num not in(.,"07JUL2014"d)  and date le "07JUL2014"d then
do;
do until(date="08JUL2014"d) ;
Observed_Count_2012_2013=0;
output;
date+1;
end;
end;

keep code date Observed_Count_2012_2013;
run;

proc sql;
create table want as
select h1.code,h1.date,Observed_Count_2014,Observed_Count_2012_2013 from have1 as h1 inner join have2 as h2
on h1.code=h2.code and h1.date=h2.date;
quit;

Occasional Contributor
Posts: 9

## Re: Combining and filling in gaps in datasets

Thank you Reeza, datasp, Xia, and Loko! I truly appreciate all your very helpful answers. I ended up using the second solution proposed by datasp, but I'm sure any of these would work well. It's great to see the different ways this can be achieved. Thanks again for the help!

🔒 This topic is solved and locked.