Hello everyone,
I have a dataset like the following, the 1st letter in first column can be 'H'(header) or Blank. When it is header record would like to get the columns from 8 to 13(6 chars-896506) say router# and the datetime from 44 to 53(10 chars-1704171513) say dttm, when it is blank like in the second line would like get columns from 2nd to 11(9 chars) say acct#. Router# will be same for all the acct#s until the another header is found.
INPUT:
H1704158965068107143934AAAAA MAAAAA300A1704171513
073340146000000502484817040100010032171+00000032446+00000015623+ 0000000
073676756000000502479917040100019167574+00000063989+00000029219+ 0000000
074974084000000502479717040100012097496+00000036608+00000021058+ 0000000
H1704158965068107143934AAAAA MAAAAA300A1704171717
073340146000000502484817040100010032171+00000032446+00000015623+ 0000000
074985264000000502487917040100021946562+00000070979+00000034166+ 0000000
H1704158965068107143934AAAAA MAAAAA300A1704171809
079814999000000502485617040100012716909+00000042454+00000019288+ 0000000
073340146000000502484817040100010032171+00000032446+00000015623+ 0000000
output:
router1#acct1#dttm1
router1#acct2#dttm1
router1#acct3#dttm1
router1#acct1#dttm2
router1#acct1#dttm3
Then would like to eliminate the duplicate records based on the timestamp which ever is the latest.
output 2:
router1#acct1#dttm3(assuming dttm3 is the latest one)
router1#acct2#dttm3
router1#acct3#dttm3
Afte the duplicates are eliminated, would like to have the original input file without the duplicates for further batch process.
I really appreciate your help.
Thank you
Neal.
When posting text data you should post it in a code box opened using the forum {i} icon. The main message window may format text spaces/ tabs. For instance when I look at colum 44 to 53 on your first line I see "00A170417" when pasting into my editor.
Adjust the column numbers in the code to match your input file. Use an INFILE statement to point to your data instead of the datalines used below.
data read; informat router $6. date yymmdd6. hour minute f2. acct $9.; input @; if _infile_=:'H' then do; input router 8-13 date 47-52 hour 53-54 minute 55-56; dttm = dhms(date,hour,minute,0); end; else input acct 2-11; format dttm datetime16.; retain router dttm; if not missing(acct); keep router acct dttm; datalines; H1704158965068107143934AAAAA MAAAAA300A1704171513 073340146000000502484817040100010032171+00000032446+00000015623+ 0000000 073676756000000502479917040100019167574+00000063989+00000029219+ 0000000 074974084000000502479717040100012097496+00000036608+00000021058+ 0000000 H1704158965068107143934AAAAA MAAAAA300A1704171717 073340146000000502484817040100010032171+00000032446+00000015623+ 0000000 074985264000000502487917040100021946562+00000070979+00000034166+ 0000000 H1704158965068107143934AAAAA MAAAAA300A1704171809 079814999000000502485617040100012716909+00000042454+00000019288+ 0000000 073340146000000502484817040100010032171+00000032446+00000015623+ 0000000 ; run; proc sort data=read; by router acct dttm; run; data want; set read; by router acct dttm; if last.acct; run;
You will have to explain further what you may mean by "the original input file without the duplicates" as that sounds like you may want data that was present but not read.
When posting text data you should post it in a code box opened using the forum {i} icon. The main message window may format text spaces/ tabs. For instance when I look at colum 44 to 53 on your first line I see "00A170417" when pasting into my editor.
Adjust the column numbers in the code to match your input file. Use an INFILE statement to point to your data instead of the datalines used below.
data read; informat router $6. date yymmdd6. hour minute f2. acct $9.; input @; if _infile_=:'H' then do; input router 8-13 date 47-52 hour 53-54 minute 55-56; dttm = dhms(date,hour,minute,0); end; else input acct 2-11; format dttm datetime16.; retain router dttm; if not missing(acct); keep router acct dttm; datalines; H1704158965068107143934AAAAA MAAAAA300A1704171513 073340146000000502484817040100010032171+00000032446+00000015623+ 0000000 073676756000000502479917040100019167574+00000063989+00000029219+ 0000000 074974084000000502479717040100012097496+00000036608+00000021058+ 0000000 H1704158965068107143934AAAAA MAAAAA300A1704171717 073340146000000502484817040100010032171+00000032446+00000015623+ 0000000 074985264000000502487917040100021946562+00000070979+00000034166+ 0000000 H1704158965068107143934AAAAA MAAAAA300A1704171809 079814999000000502485617040100012716909+00000042454+00000019288+ 0000000 073340146000000502484817040100010032171+00000032446+00000015623+ 0000000 ; run; proc sort data=read; by router acct dttm; run; data want; set read; by router acct dttm; if last.acct; run;
You will have to explain further what you may mean by "the original input file without the duplicates" as that sounds like you may want data that was present but not read.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.