BookmarkSubscribeRSS Feed
SASPhile
Quartz | Level 8

hi,

  I have a dataset which has 1 million records. There is a date filed which as values in format 20130917. But there seem to be alphabets(names) in date field. How to get 10 records above from the point the alphabets have been included in the date field?

4 REPLIES 4
ballardw
Super User

Are you requesting the 10 records before each "date" containing non-digit characters or before the first place this occurs?

If the names normally should occur after the dates in the file then likely causes involve missing or incorrect use options such as FLOWOVER or TRUNCOVER in an INFILE statement. Then the "date" is getting treated as the part of a previous line.

If the name occurs before the date there may be an issue with tabs in the name or possible another field and the column alignments are off.

Amarnath7
Fluorite | Level 6

Hi,

Treat the variable as charater variable initially, then using charater functions identify dates with alphabets, then create flag variable to get top 10 or bottom 10 of each alphabet values.


mkeintz
PROC Star

You need two synchornized readings of the data, one of them 10 records behind the other (the "if _n_>10" below).  If the "lead' reading finds an alpha character in DATE, then set a counter to 10.  The second reading tests the counter and if it's greater than or equal to 0, it outputs the record.  It also decrements the counter.  The result should be every offending record plus the 10 records preceding each of them.

Is the data in a raw data file?  Let's say it's in file 'c:\temp\t.txt', and date is the first 8 characters of each record, followed by other variables of interest:

in1 'c:\temp\t.txt';
in2 'c:\temp\t.txt';

data want (drop=prx);
   retain prx 0;

   infile in1 end=end_in1;

   if end_in1=0 then do;
     input date $8.;

     if notdigit(date) then prx=min(_n_-1,10);

   end;

   if _n_>10 then do;

     infile in2;

     input date $8.  ... other variables .... ;

     if prx>=0 then output;

     prx=prx-1;

   end;

run;

If the data is a SAS dataset (say HAVE), with DATE as a character variable, the logic is similar:

data want (drop=prx);
  retain prx 0;

  if end_of_have=0 then do;

    set have (keep=date) end=end_of_have;
    if notdigit(date) then prx=min(_n_-1,10);

  end;

  if _n_>10 then do;
    set have;
    if prx>=0 then output;

    prx=prx-1;
  end;

run;

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
SASPhile
Quartz | Level 8

Thanks!

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 1274 views
  • 0 likes
  • 4 in conversation