Re: Inserting records when values in a sequence is missing.

attjooo · Posted 05-08-2014 09:45 AM

My data set MYTAB has variables ID, MONTH, QUANTITY.

If an ID has no data for a MONTH in the sequence 1 - 12, I want to add a record with ID, MONTH and QUANTITY = . (dot for missing numeric value).

So all ID:s should have 12 records each.

Have you any idea how to solve this?

data_null__ · Posted 05-08-2014 09:56 AM

For this example where all ID's have the same month values you can do that with PROC SUMMARY CLASSDATA=

data have;
   input id $ month quantitiy;
   cards; 
a 2 11
a 7 12
b 10 11
b 4 9
;;;;
   run; 
data classdata;
   if 0 then set have(keep=month);
   do month=1 to 12; 
      output; 
      end; 
   stop; 
   run; 
proc summary nway data=have classdata=classdata;
   by id;
   class month;
   output out=filled(drop=_type_ _freq_) idgroup(out(q:)=);
   run; 
proc print; 
   run;

attjooo · Posted 05-08-2014 10:15 AM

Thanks for your answer. I am sure that it works, but for a person with little experience it's hard to understand the code.

Wouldn't it be possible to use CASE WHEN in PROC SQL?

Some sort of pseudo code: When month is missing in [1,12} add record with ID, MONTH, QUANTITY (= . ).

RW9 · Posted 05-08-2014 10:32 AM

I would suggest the easiest thing is to create a template dataset with the sequence you require and then merge that on. For example (and I haven't tested this):

data template;

do I=1 to 12;

month=I;

output;

end;

run;

proc sql;

create table WANT as

select SOME_VARIABLES, /* note update per your table */

COALESCE(A.MONTH,B.MONTH) as MONTH

from WORK.HAVE A

full join WORK.TEMPLATE B

on A.MONTH=B.MONTH;

quit;

This will add in any missing months and the other variables will be missing. If you need other variables populated then create your template with all necessary variables.

attjooo · Posted 05-09-2014 03:26 AM

The COALESCE function returns the first non-missing value from a list of numeric arguments.

I don't understand how that function could be of any help.

RW9 · Posted 05-09-2014 04:13 AM

Its a trick with merging and coalescing. Say you have these two datasets and full join them on month:

Data A Data B

Month Result Month

Jan 12 Jan

Feb 10 Feb

Mar

... ...

The full join expands the first table to include values from the right, however the month would be missing if we did not do the coalesce for Mar. Eg.

select a.month... = Month=missing, Result=missing as A.MONTH does not contain anything.

select coalesce(a.month,b.month) = Month=Mar, Result = missing, as the data from b is used as a is missing.

So, the full join creates all the necessary extra rows, the coalesce ensures that the variables are populated where necessary.

attjooo · Posted 05-09-2014 04:56 AM

Your code doesn't work.

I get the same number of records, but in ascending order with regard to MONTH. So first all MONTH =1, then all MONTH =2 , etc.

When I started each ID:s all records was displayed together( in ascending order with regard to MONTH ).

RW9 · Posted 05-09-2014 05:05 AM

So here is the full code, tested this time. The point is to create a dataset which will be the template for the other one.

data have;
attrib id format=$1. month quantity format=best.;
infile datalines delimiter=",";
input id $ month quantity;
datalines;
a,1,100
a,2,200
a,4,150
b,1,60
;
run;

/* Create template, need to base it on your have data */
proc sql;
create table IDS as
select distinct ID
from    WORK.HAVE;
quit;
data template (drop=i);
set ids;
do I=1 to 12;
    month=I;
    output;
end;
run;

proc sql;
create table WANT as
select COALESCE(A.ID,B.ID) as ID,
          COALESCE(A.MONTH,B.MONTH) as MONTH,
          A.QUANTITY
from    WORK.HAVE A
full join WORK.TEMPLATE B
on      A.ID=B.ID
and     A.MONTH=B.MONTH;
quit;

attjooo · Posted 05-09-2014 06:18 AM

Thankyou RW9 for your solution. Now it worked.

I am forced to use SAS.

This would have been an easy problem to solve in an ordinary programming language, working with arrays, and that code would have been easy to understand for anyone with basic knowledge in programming.

RW9 · Posted 05-09-2014 06:39 AM

The suggestion posted was merely for one method of doing such a task. SAS also has arrays syntax:

array months {12} 3. (1,2,3,4,5,6,7,8,9,10,11,12);

You could also look at this snippet:

retain lstmonth;

if month ne lstmonth then do;

do I=lstmonth to month-1;

month=I;

output;

end;

output;

So keep the last value of month, and if the current rows month is not the previous month +1 then output rows for the difference. Note you need to check for month=12, and set other values to missing for these additional rows.

As with most problems there is no one singular method to get what you want

Kurt_Bremser · Posted 05-09-2014 05:09 AM

try this:

*get all ID's;

proc sort

data=mytab (keep=ID)

out=control

nodupkey

;

by ID;

run;

*to be on the safe side;

proc sort data=mytab;

by ID MONTH;

run;

*make 12 months for every ID;

data control2;

set control;

do MONTH = 1 to 12;

output;

end;

run;

data result;

merge

mytab (in=a)

control2 (in=b)

;

if b;

by ID MONTH;

*everytime there is no record from mytab, quantity is implicitely set to .;

run;

Edit: Ups, forgot "by ID MONTH;" in the last data step.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

attjooo · Posted 05-09-2014 07:01 AM

Thankyou KurtBremser.

Your code was more easy to understand.

Ksharp · Posted 05-09-2014 09:49 AM

EASY FOR SQL.

data have;
   input id $ month quantitiy;
   cards; 
a 2 11
a 7 12
b 10 11
b 4 9
;;;;
   run; 
data m;
 do month=1 to 12;
  output;
 end;
run;
proc sql;
create table temp as
 select * from      (select distinct id from have) as a,m;
create table want as
 select temp.*,have.quantitiy
  from have right join  temp on temp.id=have.id and temp.month=have.month;

quit;

Xia Keshan

Classroom Training Available!