DATA Step, Macro, Functions and more

Names in Date field

Reply
Super Contributor
Posts: 673

Names in Date field

hi,

  I have a dataset which has 1 million records. There is a date filed which as values in format 20130917. But there seem to be alphabets(names) in date field. How to get 10 records above from the point the alphabets have been included in the date field?

Super User
Posts: 11,343

Re: Names in Date field

Are you requesting the 10 records before each "date" containing non-digit characters or before the first place this occurs?

If the names normally should occur after the dates in the file then likely causes involve missing or incorrect use options such as FLOWOVER or TRUNCOVER in an INFILE statement. Then the "date" is getting treated as the part of a previous line.

If the name occurs before the date there may be an issue with tabs in the name or possible another field and the column alignments are off.

Occasional Contributor
Posts: 8

Re: Names in Date field

Hi,

Treat the variable as charater variable initially, then using charater functions identify dates with alphabets, then create flag variable to get top 10 or bottom 10 of each alphabet values.


Trusted Advisor
Posts: 1,022

Re: Names in Date field

You need two synchornized readings of the data, one of them 10 records behind the other (the "if _n_>10" below).  If the "lead' reading finds an alpha character in DATE, then set a counter to 10.  The second reading tests the counter and if it's greater than or equal to 0, it outputs the record.  It also decrements the counter.  The result should be every offending record plus the 10 records preceding each of them.

Is the data in a raw data file?  Let's say it's in file 'c:\temp\t.txt', and date is the first 8 characters of each record, followed by other variables of interest:

in1 'c:\temp\t.txt';
in2 'c:\temp\t.txt';

data want (drop=prx);
   retain prx 0;

   infile in1 end=end_in1;

   if end_in1=0 then do;
     input date $8.;

     if notdigit(date) then prx=min(_n_-1,10);

   end;

   if _n_>10 then do;

     infile in2;

     input date $8.  ... other variables .... ;

     if prx>=0 then output;

     prx=prx-1;

   end;

run;

If the data is a SAS dataset (say HAVE), with DATE as a character variable, the logic is similar:

data want (drop=prx);
  retain prx 0;

  if end_of_have=0 then do;

    set have (keep=date) end=end_of_have;
    if notdigit(date) then prx=min(_n_-1,10);

  end;

  if _n_>10 then do;
    set have;
    if prx>=0 then output;

    prx=prx-1;
  end;

run;

Super Contributor
Posts: 673

Re: Names in Date field

Thanks!

Ask a Question
Discussion stats
  • 4 replies
  • 241 views
  • 0 likes
  • 4 in conversation