Desktop productivity for business analysts and programmers

Filling in missing RECORDS

Reply
Contributor
Posts: 20

Filling in missing RECORDS

I have a dataset like this...

person_id date amount
102 25FEB2007 $25
102 15MAY2007 $100
102 11DEC2007 $30

I'm trying to do the following...

1. Create a new variable called "month" that represents the number of months starting from their first record. For example, the first record for person 102 would have month=1.

2. But this person has no record for March or April so I need to add 2 records where month=2 and month=3, and the amount would be set to $0. I need to "fill in" records up to the current month.

I'm guessing there is probably a "do" loop and "retain" involved and I'm hoping someone's got this wired already.
N/A
Posts: 0

Re: Filling in missing RECORDS

A quick cludgy way, not elegant but it works and it should help get the concept:

A is your Data Table

/*Get the minimum Dates to find out where to start loops*/
PROC SQL;
CREATE TABLE A AS
SELECT PERSON_ID, MIN(DATE) AS MIN_DATE
FROM DATA_TAB
GROUP BY PERSON_ID;

/*Populate Dummy Date from Start Month to Current Month*/
DATA BLANKER (KEEP=DATE I AMOUNT PERSON_ID);
FORMAT MIN_DATE DATE9.;
FORMAT DATE DATE9.;SET A;
DIST=INTCK('MONTH',MIN_DATE,TODAY());
DO I = 1 TO DIST-1;
DATE=INTNX('MONTH',MIN_DATE,I);
AMOUNT = 0;
OUTPUT;
END;

/*Sort the dummy data to get ready for merge*/
PROC SORT DATA=BLANKER;
BY PERSON_ID DATE;


/* Interleave dummy date data and original data*/
/* The I is to set up your counter to start from the start month*/
DATA COMBINE;
MERGE DATA_TAB BLANKER;
BY PERSON_ID DATE;
DATE_MT = MONTH(DATE);
DATE_YR = YEAR(DATE);
IF I EQ . THEN I = 1;
ELSE I = I +1;
RUN;

/*Now Collpase the Dummy and Real Data*/
/* FINAL is the table you want*/
PROC SQL;
FORMAT DATE DATE9.;
CREATE TABLE FINAL AS
SELECT PERSON_ID, MAX(DATE) AS DATE, MAX(AMOUNT) AS AMOUNT, MAX(I) AS MONTH
FROM COMBINE
GROUP BY PERSON_ID, DATE_MT, DATE_YR
ORDER BY PERSON_ID, MONTH;
QUIT;

Hope that helps get you started, I could do more elegant, but this works and helps describe what is going on.

Ike Eisenhauer
N/A
Posts: 0

Re: Filling in missing RECORDS

[pre]

proc sort data=indata by person_id date;

data expanded;
set indata;
cmonth = month(date);
run;
quit;

data expanded;
set expanded;
by person_id cmonth;
retain month sum;

if first.person_id then month = 0;
if first.cmonth then do;
sum = 0;
month+1;
end;

diff = cmonth - lag(cmonth);

if diff < 2 then sum+amount;
else do i=1 to diff;
amount=0;
output;
month+1;
end;

if last.cmonth then do;
amount = sum;
output;
end;

drop cmonth sum diff i;
end;

[/pre]
Ask a Question
Discussion stats
  • 2 replies
  • 132 views
  • 0 likes
  • 2 in conversation