Hi all,
I have a raw tab delimited txt file that I was able to read into sas. The issue is that the file is pretty big, and there is some information (title, company info, and variable names) that gets repeated at every page (Check sample file attached).
If I remove this information by hand (i.e. titles, company info, and variable names, etc...) and just keep the data itself, sas reads the data in just fine.
Below is a sample of the code. Note that the data starts in one line and continues on the next line (2 lines for each observation), consequently I had to use the "/" so SAS can skip to the next line and continue reading a particular observation. Also note that each observation starts with the FED_TAX_ID number. So the 1st observation starts with "37-1633176", the 2nd observation starts with "84-0727078", and so on. The code is shown below.
So the question is: is there a way that I can read in the same raw file but without manually having to first remove the extra repetitive info (title, company info, and variable names)? Is this possible? Or does one need to just bite the bullet and manually delete whatever info that is not actual data?
Your help would be greatly appreciated. Thanks!
Alex
************************************************************************;
data new;
infile datalines;
input FED_TAX_ID $12. +3
CODE_1099 $10. +1
@27 DOLLARS_1099 comma12.
@43 VENDOR_NAME $40. +3
VENDOR_NUMBER $10. +4
VENDOR_STATUS $2. +5
PARENT_VENDOR $2. /
VENDOR_ADDRESS_1 $30. +3
VENDOR_ADDRESS_2 $30. +3
VENDOR_ADDRESS_3 $30. +3
VENDOR_CITY $20. +2
STATE $2. +1
ZIP $8.;
datalines;
37-1633176 LRT 987,425.61 BROOKWOOD HIGHLANDS INVESTORS 000505033 A N
C/O NEWMARK KNIGHT FRANK FREDERICK ROSS PO BOX 13797 DENVER CO 80201
84-0727078 LRT 29,496.00 BROWNLEIGH COURT LLC 000436840 A N
1410 GRANT STREET SUITE A201 DENVER CO 80203185
84-0417134 LRT 151,492.44 EXEMPLA ST JOSEPH HOSPITAL 000398348 A
MIDTOWN MEDICAL CENTER C/O CITYWIDE BANKS PO BOX 5265 DENVER CO 80217526
84-0474717 LRT 179,307.32 JEFFERSON CENTER FOR MENTAL 000471847 A N
HEALTH 4851 INDEPENDENCE ST SUITE 200 WHEAT RIDGE CO 80033
;
RUN;
************************************************************************;
If the first line on a page always contain the string "COMPANY INFO" and every page as 10 lines of header, you could do:
data new;
infile "&SASFORUM.\datasets\temp.txt";
input @;
if length(_infile_) > 71 then
if substr(_infile_, 59, 12) = "COMPANY INFO" then line = 0;
line + 1;
if line > 10;
input FED_TAX_ID $12. +3
CODE_1099 $10. +1
@27 DOLLARS_1099 comma12.
@43 VENDOR_NAME $40. +3
VENDOR_NUMBER $10. +4
VENDOR_STATUS $2. +5
PARENT_VENDOR $2. /
VENDOR_ADDRESS_1 $30. +3
VENDOR_ADDRESS_2 $30. +3
VENDOR_ADDRESS_3 $30. +3
VENDOR_CITY $20. +2
STATE $2. +1
ZIP $8.;
run;
PG
avbraga,
You might have to fiddle with the number of "/", but here's the idea. This code gets inserted into your DATA step after the INFILE statement, but before your existing INPUT statement.
input @;
if index(_infile_, 'COMPANY INFO') then input /////////;
It looks like you have to skip a total of 10 lines at the top of each page, so there are 9 slashes. The 10th line gets skipped automatically when the second statement ends.
Good luck.
If the first line on a page always contain the string "COMPANY INFO" and every page as 10 lines of header, you could do:
data new;
infile "&SASFORUM.\datasets\temp.txt";
input @;
if length(_infile_) > 71 then
if substr(_infile_, 59, 12) = "COMPANY INFO" then line = 0;
line + 1;
if line > 10;
input FED_TAX_ID $12. +3
CODE_1099 $10. +1
@27 DOLLARS_1099 comma12.
@43 VENDOR_NAME $40. +3
VENDOR_NUMBER $10. +4
VENDOR_STATUS $2. +5
PARENT_VENDOR $2. /
VENDOR_ADDRESS_1 $30. +3
VENDOR_ADDRESS_2 $30. +3
VENDOR_ADDRESS_3 $30. +3
VENDOR_CITY $20. +2
STATE $2. +1
ZIP $8.;
run;
PG
Thank you, Astounding and PGStats.
PG, you're on to something. The file was read, and it kinda works, I will just have to fiddle with the data boundaries, and take a look at your if statements to see what's going on there.
Thank you. I will keep you guys posted and will put here the actual working code soon.
I appreciate it! Best,
Alex
You guys are rock starts!
PG, this worked great. Many thanks!
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.