Correction of observations

Accepted Solution Solved
Reply
Frequent Contributor
Posts: 81
Accepted Solution

Correction of observations

Hello friends,
I need some help in data management, I have a large dataset (215123 observations and 3 variables): each ID in each year have different number of observations. For some reason, the observations sometimes recorded incorrect. How to correct them?


Here's an example of my data.

IDDateObs
11/25/20001
112/13/20002
15/5/20013
110/11/20023
23/14/19992
21/1/20003
211/21/20004
25/19/20014

I want to create two new variables NEW1 and NEW2 for all firm-observations that show date difference in days and month, respectively, between them (where the first obs is based).

Expended output:

IDDateObsNEW_1NEW_2
11/25/2000100
112/13/2000232311
15/5/200131435
110/11/2002452417
23/14/1999200
21/1/2000329310
211/21/2000432511
25/19/200151796


I would appreciate if someone would share a code with me.

Thank you,

Zana


Accepted Solutions
Solution
‎05-28-2014 06:22 AM
Super User
Super User
Posts: 7,392

Re: Correction of observations

Hi,

Well issue 1, the obs.  You can fix that (assuming you have a sort order) quite easily by data step (note I haven't tested either of these):

proc sort data=have;

     by id date;

run;

data want;

     set have;

     by id;

     retain lstobs;

     if first.id the lstobs=obs;

     else do;

          lstobs+1;

          obs=lstobs;

     end;

run;

Issue 2, this would also be solved by using a retain to retain the last date and then doing an intck (interval function) on the retained date and the current date:

data want;

     set have;

     by id;

     retain lstdate;

     if first.id then do;

          lstdate=date;

          new_1=0;

          new_2=0;

     end;

     else do;

          new_1=intck('MONTHS',lstdate,date);

          new_2=intck(DAYS',lstdate,date);

          lstdate=date;

     end;

run;

You could also look at lag() function, however I have never been able to get that to work.

View solution in original post


All Replies
Solution
‎05-28-2014 06:22 AM
Super User
Super User
Posts: 7,392

Re: Correction of observations

Hi,

Well issue 1, the obs.  You can fix that (assuming you have a sort order) quite easily by data step (note I haven't tested either of these):

proc sort data=have;

     by id date;

run;

data want;

     set have;

     by id;

     retain lstobs;

     if first.id the lstobs=obs;

     else do;

          lstobs+1;

          obs=lstobs;

     end;

run;

Issue 2, this would also be solved by using a retain to retain the last date and then doing an intck (interval function) on the retained date and the current date:

data want;

     set have;

     by id;

     retain lstdate;

     if first.id then do;

          lstdate=date;

          new_1=0;

          new_2=0;

     end;

     else do;

          new_1=intck('MONTHS',lstdate,date);

          new_2=intck(DAYS',lstdate,date);

          lstdate=date;

     end;

run;

You could also look at lag() function, however I have never been able to get that to work.

Frequent Contributor
Posts: 81

Re: Correction of observations

Hardly thanks for your replies and insight.

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 2 replies
  • 165 views
  • 0 likes
  • 2 in conversation