BookmarkSubscribeRSS Feed
SASPhile
Quartz | Level 8

hi,

  I have a dataset which has 1 million records. There is a date filed which as values in format 20130917. But there seem to be alphabets(names) in date field. How to get 10 records above from the point the alphabets have been included in the date field?

4 REPLIES 4
ballardw
Super User

Are you requesting the 10 records before each "date" containing non-digit characters or before the first place this occurs?

If the names normally should occur after the dates in the file then likely causes involve missing or incorrect use options such as FLOWOVER or TRUNCOVER in an INFILE statement. Then the "date" is getting treated as the part of a previous line.

If the name occurs before the date there may be an issue with tabs in the name or possible another field and the column alignments are off.

Amarnath7
Fluorite | Level 6

Hi,

Treat the variable as charater variable initially, then using charater functions identify dates with alphabets, then create flag variable to get top 10 or bottom 10 of each alphabet values.


mkeintz
PROC Star

You need two synchornized readings of the data, one of them 10 records behind the other (the "if _n_>10" below).  If the "lead' reading finds an alpha character in DATE, then set a counter to 10.  The second reading tests the counter and if it's greater than or equal to 0, it outputs the record.  It also decrements the counter.  The result should be every offending record plus the 10 records preceding each of them.

Is the data in a raw data file?  Let's say it's in file 'c:\temp\t.txt', and date is the first 8 characters of each record, followed by other variables of interest:

in1 'c:\temp\t.txt';
in2 'c:\temp\t.txt';

data want (drop=prx);
   retain prx 0;

   infile in1 end=end_in1;

   if end_in1=0 then do;
     input date $8.;

     if notdigit(date) then prx=min(_n_-1,10);

   end;

   if _n_>10 then do;

     infile in2;

     input date $8.  ... other variables .... ;

     if prx>=0 then output;

     prx=prx-1;

   end;

run;

If the data is a SAS dataset (say HAVE), with DATE as a character variable, the logic is similar:

data want (drop=prx);
  retain prx 0;

  if end_of_have=0 then do;

    set have (keep=date) end=end_of_have;
    if notdigit(date) then prx=min(_n_-1,10);

  end;

  if _n_>10 then do;
    set have;
    if prx>=0 then output;

    prx=prx-1;
  end;

run;

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
SASPhile
Quartz | Level 8

Thanks!

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 880 views
  • 0 likes
  • 4 in conversation