BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
hishope
Calcite | Level 5

  Hi all,

A sample from the dataset attached below. There are some missing month by each ID number(Table1. before). I want to add rows of missing month per each ID(Table2.after).  Since I am having more than 30000 IDs, it seems hard to add rows one by one. Is there anyone who have an idea? Thank you in advance.

Table1. before. (what I have now)

before.jpg

Table2. after (what I want to see)

after.jpg

Miyoung

1 ACCEPTED SOLUTION

Accepted Solutions
PaulLee
Fluorite | Level 6

I would do a cartesian join to get all the combination on months and id... so something like this? Probably not the most efficient though!

data test;
infile datalines dlm=',' dsd missover;

input id : 8.
   month : 8.
   value : 8.
;
datalines;
1,1,23
1,2,334
1,3,25
2,1,45
2,3,450
;
run;

proc sql;
create table unique_months as
select distinct month
from  test
;
quit;

proc sql;
create table unique_id as
select distinct id
from  test
;
quit;


proc sql;
create table cartesian as
select  i.id,
    m.month
from  unique_id i,
    unique_months m
;
quit;


proc sql;
create table final as
select  c.id,
    c.month,
    t.value
from  cartesian c
left join test t
on   c.id = t.id
    and c.month=t.month
;
quit;

View solution in original post

9 REPLIES 9
Linlin
Lapis Lazuli | Level 10

try:

data have;

input id month ;

something=30;

cards;

1 1

1 2

1 3

1 4

1 5

1 6

1 8

1 10

1 12

2 1

2 2

2 3

2 4

2 5

2 7

2 8

2 10

2 12

;

proc sort data=have out=temp(keep=id) nodupkey;

by id;

data temp;

set temp;

something=0;

do month=1 to 12;

output;

end;

run;

data want;

  merge temp have;

  by id month ;

run;

proc print;run;

Linlin

hishope
Calcite | Level 5

Thanks much! I like this short answer!

PaulLee
Fluorite | Level 6

I would do a cartesian join to get all the combination on months and id... so something like this? Probably not the most efficient though!

data test;
infile datalines dlm=',' dsd missover;

input id : 8.
   month : 8.
   value : 8.
;
datalines;
1,1,23
1,2,334
1,3,25
2,1,45
2,3,450
;
run;

proc sql;
create table unique_months as
select distinct month
from  test
;
quit;

proc sql;
create table unique_id as
select distinct id
from  test
;
quit;


proc sql;
create table cartesian as
select  i.id,
    m.month
from  unique_id i,
    unique_months m
;
quit;


proc sql;
create table final as
select  c.id,
    c.month,
    t.value
from  cartesian c
left join test t
on   c.id = t.id
    and c.month=t.month
;
quit;

hishope
Calcite | Level 5

It was very helpful. Thank you very much for your time!!

Dorota_Jarosz
Obsidian | Level 7

@PaulLee: Your solution has a drawback when all ids are missing one or more months. You will only get the months tthat exist in the data. You should create all 12 months instead.

PaulLee
Fluorite | Level 6

Yes, you are very correct! Creating the months using dataline statement would have been better. Thanks.

tish
Calcite | Level 5

My solution is basically the same as Paul Lee's.

data months (keep=monthi);

   do monthi = 1 to 12;

      output;

   end;

run;

proc sql;

   create table IDs as

      select distinct ID

      from original_dataset;

   create table cartesian as

      select

         a.ID,

         b.monthi

      from

         IDs as a,

         months as b;

   create table new_dataset as

      select

         a.ID,

         a.lprice,

         a.lquan,

         b.monthi

      from

         original_dataset as a

            right join

         cartesian as b

      on

         a.ID = b.ID and

         a.monthi = b.monthi;

quit;

hishope
Calcite | Level 5

Thank you for your time. It helped!!

Dorota_Jarosz
Obsidian | Level 7

Here is an alternative, without merging or joining, just assuming the original data is ordered by id and monthi. This solution requires only one pass though the data set.

I generated similarly structured data with just one variable, then deleted few rows to create gaps.


data old;
do id=49,129,185;
  do monthi=1 to 12;
    lprice=-3-ranuni(0);
    output;
    end;
  end;
run;

data oldmiss;
set old;
if id=49 and monthi=10 then delete;
if id=129 and monthi in (2,3) then delete;
if id=185 and monthi in (5,6,7,8) then delete;
run;

proc print data=oldmiss;
title "Original raw data with month gaps";
run;
*Original raw data with month gaps      
Obs     id    monthi     lprice
  1     49       1      -3.94038
  2     49       2      -3.43387
  3     49       3      -3.99733
  4     49       4      -3.43186
  5     49       5      -3.02512
  6     49       6      -3.34443
  7     49       7      -3.28744
  8     49       8      -3.62029
  9     49       9      -3.85440
10     49      11      -3.74302
11     49      12      -3.62365
12    129       1      -3.83959
13    129       4      -3.32539
14    129       5      -3.97315
15    129       6      -3.38563
16    129       7      -3.04920
17    129       8      -3.39072
18    129       9      -3.38264
19    129      10      -3.00922
20    129      11      -3.08115
21    129      12      -3.54783
22    185       1      -3.66660
23    185       2      -3.55456
24    185       3      -3.49099
25    185       4      -3.60728
26    185       9      -3.22515
27    185      10      -3.67814
28    185      11      -3.50221
29    185      12      -3.79233 ;

data new;
set oldmiss;
by id monthi;
if _N_=1 or first.id then do;
  month=1;
  end;
else do;
  month+1;
  end;
if month<monthi then do;
  lprice_hold=lprice;
  monthi_hold=monthi;
  do i=1 to monthi-month;
    lprice=0; monthi=month;
    output;
    month=month+1;
    end;
  lprice=lprice_hold;
  monthi=monthi_hold;
  output;
  end;
else output;
drop i monthi_hold lprice_hold month;
run;

proc print data=new;
title "Filled in data with no month gaps";
format lprice best8.;
run;
*Filled in data with no month gaps          
Obs     id    monthi     lprice
  1     49       1      -3.94038
  2     49       2      -3.43387
  3     49       3      -3.99733
  4     49       4      -3.43186
  5     49       5      -3.02512
  6     49       6      -3.34443
  7     49       7      -3.28744
  8     49       8      -3.62029
  9     49       9       -3.8544
10     49      10             0
11     49      11      -3.74302
12     49      12      -3.62365
13    129       1      -3.83959
14    129       2             0
15    129       3             0
16    129       4      -3.32539
17    129       5      -3.97315
18    129       6      -3.38563
19    129       7       -3.0492
20    129       8      -3.39072
21    129       9      -3.38264
22    129      10      -3.00922
23    129      11      -3.08115
24    129      12      -3.54783
25    185       1       -3.6666
26    185       2      -3.55456
27    185       3      -3.49099
28    185       4      -3.60728
29    185       5             0
30    185       6             0
31    185       7             0
32    185       8             0
33    185       9      -3.22515
34    185      10      -3.67814
35    185      11      -3.50221
36    185      12      -3.79233
;

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 9 replies
  • 5517 views
  • 13 likes
  • 5 in conversation