DATA Step, Macro, Functions and more

Inserting records when values in a sequence is missing.

Reply
Contributor
Posts: 30

Inserting records when values in a sequence is missing.

My data set MYTAB has variables ID, MONTH, QUANTITY.

If an ID has no data for a MONTH in the sequence 1 - 12, I want to add a record with ID, MONTH and QUANTITY = . (dot for missing numeric value).

So all ID:s should have 12 records each.

Have you any idea how to solve this?

Respected Advisor
Posts: 3,799

Re: Inserting records when values in a sequence is missing.

For this example where all ID's have the same month values you can do that with PROC SUMMARY CLASSDATA=

data have;
   input id $ month quantitiy;
   cards;
a 2 11
a 7 12
b 10 11
b 4 9
;;;;
   run;
data classdata;
   if 0 then set have(keep=month);
   do month=1 to 12;
     
output;
     
end;
  
stop;
  
run;
proc summary nway data=have classdata=classdata;
   by id;
   class month;
   output out=filled(drop=_type_ _freq_) idgroup(out(qSmiley Happy=);
   run;
proc print;
  
run;
Contributor
Posts: 30

Re: Inserting records when values in a sequence is missing.

Posted in reply to data_null__

Thanks for your answer. I am sure that it works, but for a person with little experience it's hard to understand the code.

Wouldn't it be possible to use CASE WHEN in PROC SQL?

Some sort of pseudo code: When month is missing in [1,12} add record with ID, MONTH, QUANTITY (= . ).

Super User
Super User
Posts: 7,970

Re: Inserting records when values in a sequence is missing.

I would suggest the easiest thing is to create a template dataset with the sequence you require and then merge that on.  For example (and I haven't tested this):

data template;

     do I=1 to 12;

          month=I;

          output;

     end;

run;

proc sql;

     create table WANT as

     select     SOME_VARIABLES,     /* note update per your table */

                    COALESCE(A.MONTH,B.MONTH) as MONTH

     from         WORK.HAVE A

     full join      WORK.TEMPLATE B

     on           A.MONTH=B.MONTH;

quit;

This will add in any missing months and the other variables will be missing.  If you need other variables populated then create your template with all necessary variables.

Contributor
Posts: 30

Re: Inserting records when values in a sequence is missing.

The COALESCE function returns the first non-missing value from a list of numeric arguments.

I don't understand how that function could be of any help.

Super User
Super User
Posts: 7,970

Re: Inserting records when values in a sequence is missing.

Its a trick with merging and coalescing.  Say you have these two datasets and full join them on month:

Data A                                             Data B

Month     Result                                   Month

Jan          12                                         Jan

Feb          10                                        Feb

                                                            Mar

...                                                            ...

The full join expands the first table to include values from the right, however the month would be missing if we did not do the coalesce for Mar. Eg.

select a.month...   = Month=missing, Result=missing as A.MONTH does not contain anything.

select coalesce(a.month,b.month) = Month=Mar, Result = missing, as the data from b is used as a is missing.

So, the full join creates all the necessary extra rows, the coalesce ensures that the variables are populated where necessary.

Contributor
Posts: 30

Re: Inserting records when values in a sequence is missing.

Your code doesn't work.

I get the same number of records, but in ascending order with regard to MONTH. So first all MONTH =1, then all MONTH =2 , etc.

When I started each ID:s all records was displayed together( in ascending order with regard to MONTH ).

Super User
Super User
Posts: 7,970

Re: Inserting records when values in a sequence is missing.

So here is the full code, tested this time.  The point is to create a dataset which will be the template for the other one.

data have;
  attrib id format=$1. month quantity format=best.;
  infile datalines delimiter=",";
  input id $ month quantity;
datalines;
a,1,100
a,2,200
a,4,150
b,1,60
;
run;

/* Create template, need to base it on your have data */
proc sql;
  create table IDS as
  select  distinct ID
  from    WORK.HAVE;
quit;
data template (drop=i);
  set ids;
  do I=1 to 12;
    month=I;
    output;
  end;
run;

proc sql;
  create table WANT as
  select  COALESCE(A.ID,B.ID) as ID,   
          COALESCE(A.MONTH,B.MONTH) as MONTH,
          A.QUANTITY
  from    WORK.HAVE A
  full join WORK.TEMPLATE B
  on      A.ID=B.ID
  and     A.MONTH=B.MONTH;
quit;

Contributor
Posts: 30

Re: Inserting records when values in a sequence is missing.

Thankyou RW9 for your solution. Now it worked.

I am forced to use SAS.

This would have been an easy problem to solve in an ordinary programming language, working with arrays, and that code would have been easy to understand for anyone with basic knowledge in programming.

Super User
Super User
Posts: 7,970

Re: Inserting records when values in a sequence is missing.

The suggestion posted was merely for one method of doing such a task.  SAS also has arrays syntax:

     array months {12} 3. (1,2,3,4,5,6,7,8,9,10,11,12);

You could also look at this snippet:

retain lstmonth;

if month ne lstmonth then do;

     do I=lstmonth to month-1;

          month=I;

          output;

     end;

end;

output;

So keep the last value of month, and if the current rows month is not the previous month +1 then output rows for the difference.  Note you need to check for month=12, and set other values to missing for these additional rows.

As with most problems there is no one singular method to get what you want Smiley Happy

Super User
Posts: 7,814

Re: Inserting records when values in a sequence is missing.

try this:

*get all ID's;

proc sort

  data=mytab (keep=ID)

  out=control

  nodupkey

;

by ID;

run;

*to be on the safe side;

proc sort data=mytab;

by ID MONTH;

run;

*make 12 months for every ID;

data control2;

set control;

do MONTH = 1 to 12;

output;

end;

run;

data result;

merge

  mytab (in=a)

  control2 (in=b)

;

if b;

by ID MONTH;

*everytime there is no record from mytab, quantity is implicitely set to .;

run;

Edit: Ups, forgot "by ID MONTH;" in the last data step.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Contributor
Posts: 30

Re: Inserting records when values in a sequence is missing.

Posted in reply to KurtBremser

Thankyou KurtBremser.

Your code was more easy to understand.

Super User
Posts: 10,035

Re: Inserting records when values in a sequence is missing.

EASY FOR SQL.

data have;
   input id $ month quantitiy;
   cards; 
a 2 11
a 7 12
b 10 11
b 4 9
;;;;
   run; 
data m;
 do month=1 to 12;
  output;
 end;
run;
proc sql;
create table temp as
 select * from      (select distinct id from have) as a,m;
create table want as
 select temp.*,have.quantitiy
  from have right join  temp on temp.id=have.id and temp.month=have.month;

quit;

Xia Keshan

Ask a Question
Discussion stats
  • 12 replies
  • 1026 views
  • 0 likes
  • 5 in conversation