BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.

Hello everyone,

 

I have a dataset like the following, the 1st letter in first column can be 'H'(header) or Blank. When it is header record would like to get the columns from 8 to 13(6 chars-896506) say router# and the datetime from 44 to 53(10 chars-1704171513) say dttm,  when it is blank like in the second line  would like get columns from 2nd to 11(9 chars) say acct#. Router# will be same for all the acct#s until the another header is found. 

 

INPUT:

 

H1704158965068107143934AAAAA        MAAAAA300A1704171513
   073340146000000502484817040100010032171+00000032446+00000015623+ 0000000
   073676756000000502479917040100019167574+00000063989+00000029219+ 0000000
   074974084000000502479717040100012097496+00000036608+00000021058+ 0000000

H1704158965068107143934AAAAA        MAAAAA300A1704171717
   073340146000000502484817040100010032171+00000032446+00000015623+ 0000000
   074985264000000502487917040100021946562+00000070979+00000034166+ 0000000

H1704158965068107143934AAAAA        MAAAAA300A1704171809
   079814999000000502485617040100012716909+00000042454+00000019288+ 0000000  

   073340146000000502484817040100010032171+00000032446+00000015623+ 0000000
  

 

output:

router1#acct1#dttm1

router1#acct2#dttm1

router1#acct3#dttm1

router1#acct1#dttm2

router1#acct1#dttm3

 

Then would like to eliminate the duplicate records based on the timestamp which ever is the latest.

 

output 2:

router1#acct1#dttm3(assuming dttm3 is the latest one)

router1#acct2#dttm3

router1#acct3#dttm3

 

Afte the duplicates are eliminated, would like to have the original input file without the duplicates for further batch process.

I really appreciate your help.

 

Thank you

Neal. 

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

When posting text data you should post it in a code box opened using the forum {i} icon. The main message window may format text spaces/ tabs. For instance when I look at colum 44 to 53 on your first line I see "00A170417" when pasting into my editor.

 

Adjust the column numbers in the code to match your input file. Use an INFILE statement to point to your data instead of the datalines used below.

 

data read;
   informat router $6. date yymmdd6. hour minute f2. acct $9.;
   input @;
   if _infile_=:'H' then do;
      input router 8-13 date 47-52 hour 53-54 minute 55-56;
      dttm = dhms(date,hour,minute,0);
   end;
   else input acct 2-11;
   format dttm datetime16.;
   retain router dttm;
   if not missing(acct);
   keep router acct dttm;
datalines;
H1704158965068107143934AAAAA        MAAAAA300A1704171513
   073340146000000502484817040100010032171+00000032446+00000015623+ 0000000
   073676756000000502479917040100019167574+00000063989+00000029219+ 0000000
   074974084000000502479717040100012097496+00000036608+00000021058+ 0000000
H1704158965068107143934AAAAA        MAAAAA300A1704171717
   073340146000000502484817040100010032171+00000032446+00000015623+ 0000000
   074985264000000502487917040100021946562+00000070979+00000034166+ 0000000
H1704158965068107143934AAAAA        MAAAAA300A1704171809
   079814999000000502485617040100012716909+00000042454+00000019288+ 0000000  
   073340146000000502484817040100010032171+00000032446+00000015623+ 0000000
;
run;

proc sort data=read;
  by router acct dttm;
run;

data want;
  set read;
  by router acct dttm;
  if last.acct;
run;


You will have to explain further what you may mean by "the original input file without the duplicates" as that sounds like you may want data that was present but not read.

 

View solution in original post

1 REPLY 1
ballardw
Super User

When posting text data you should post it in a code box opened using the forum {i} icon. The main message window may format text spaces/ tabs. For instance when I look at colum 44 to 53 on your first line I see "00A170417" when pasting into my editor.

 

Adjust the column numbers in the code to match your input file. Use an INFILE statement to point to your data instead of the datalines used below.

 

data read;
   informat router $6. date yymmdd6. hour minute f2. acct $9.;
   input @;
   if _infile_=:'H' then do;
      input router 8-13 date 47-52 hour 53-54 minute 55-56;
      dttm = dhms(date,hour,minute,0);
   end;
   else input acct 2-11;
   format dttm datetime16.;
   retain router dttm;
   if not missing(acct);
   keep router acct dttm;
datalines;
H1704158965068107143934AAAAA        MAAAAA300A1704171513
   073340146000000502484817040100010032171+00000032446+00000015623+ 0000000
   073676756000000502479917040100019167574+00000063989+00000029219+ 0000000
   074974084000000502479717040100012097496+00000036608+00000021058+ 0000000
H1704158965068107143934AAAAA        MAAAAA300A1704171717
   073340146000000502484817040100010032171+00000032446+00000015623+ 0000000
   074985264000000502487917040100021946562+00000070979+00000034166+ 0000000
H1704158965068107143934AAAAA        MAAAAA300A1704171809
   079814999000000502485617040100012716909+00000042454+00000019288+ 0000000  
   073340146000000502484817040100010032171+00000032446+00000015623+ 0000000
;
run;

proc sort data=read;
  by router acct dttm;
run;

data want;
  set read;
  by router acct dttm;
  if last.acct;
run;


You will have to explain further what you may mean by "the original input file without the duplicates" as that sounds like you may want data that was present but not read.

 

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 1 reply
  • 956 views
  • 0 likes
  • 2 in conversation