DATA Step, Macro, Functions and more

Code To Import Non-delimited file when first variable contains values that vary in length

Reply
New Contributor
Posts: 2

Code To Import Non-delimited file when first variable contains values that vary in length

HELP!

 

Attached is a an example of a file that I need inport into SAS. Unfortunately, it has no delimiters and even better, the first data (Revenue Amount) varies in length before the first character of the next data (Priority Line) starts. Is there code I can use in the INPUT statement that can determine when the first character starts in each record?

 

The data in the attached example needs to be placed in columns as follows:

 

Revenue_Amount (varies in each record)

Priority_Line (first character in string and Length is 3)

Revenue_Year (used as text and length is 4)

Revenue_Month (used as text and length is 2)

Legacy_Customer_Numer (text with length of 9)

SAP_Customer_Number (text with length of 10)

 

 

Thanks in advance for your help with this!

 

Rudy

Attachment
Super User
Posts: 6,971

Re: Code To Import Non-delimited file when first variable contains values that vary in length

Since each line is encapsulated in those unhappy double quotes, you can't input in one statement, but have to read the line and dissect it:

data want;
infile '$HOME/sascommunity/SASHelpExample.csv' dsd firstobs=2;
input line :$100.;
length
  revenue_amount 8
  priority_line $3
  revenue_year $4
  revenue_month $2
  legacy_customer_number $9
  sap_customer_number $10
;
revenue_amount = input(scan(line,1,'09'x),12.);
priority_line = scan(line,2,'09'x);
revenue_year = scan(line,3,'09'x);
revenue_month = scan(line,4,'09'x);
legacy_customer_number = scan(line,5,'09'x);
sap_customer_number = scan(line,6,'09'x);
drop line;
run;
---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
New Contributor
Posts: 2

Re: Code To Import Non-delimited file when first variable contains values that vary in length

Kurt,

 

Thanks for the quick reply! This worked like a charm!

 

Thanks!

 

Rudy

Super User
Super User
Posts: 7,430

Re: Code To Import Non-delimited file when first variable contains values that vary in length

Why do you not just reject the file?  Surely according to your data import agreement and contract with the data provider you will know what you are expecting from the file.  Me, if someone sent me a file called .csv (Comma Separated Variable file) which wasn't in that format, or in fact a proper usable format, I would reject with the reponse of poor data received.

Super User
Posts: 17,936

Re: Code To Import Non-delimited file when first variable contains values that vary in length


RW9 wrote:

Why do you not just reject the file?  Surely according to your data import agreement and contract with the data provider you will know what you are expecting from the file.  Me, if someone sent me a file called .csv (Comma Separated Variable file) which wasn't in that format, or in fact a proper usable format, I would reject with the reponse of poor data received.


This isn't always possible.

I've seen this due to either restrictions on what the survey tool exports or an application system. This is one way they get you with 'free' survey options, if you want a decent data file you need to pay to get a SPSS or SAS version.  Another is, I've worked for social services agencies and doctors. There focus is on patients/clients and data is secondary, that's why they hired me in the first place. Am I really going to ask a social worker to skip an appointment with a client or a doctor to skip time with a patient to fix data? No. I'm going to try and see if I can fix it first before 'rejecting' the file.  And if I had a staff member who did reject a file, they'd likely get reprimanded at minimum. 99% of the time data cleaning is part of the job, and if someone doesn't want to do that part they're not doing their job. 

 

Super User
Posts: 10,548

Re: Code To Import Non-delimited file when first variable contains values that vary in length

[ Edited ]

if your file is NOT delimited do not name it as CSV. Use TXT for an extension. CSV wants to open as if is comma separted and dependig on a number of different factors just plain misbehaves.

 

And except for the stupid " is actually TAB delimited.

Contributor
Posts: 27

Re: Code To Import Non-delimited file when first variable contains values that vary in length

Another option would be to open your file, save it under a different name and perform 2 search and replacements:  replace the double-quotation marks with nothing, and replace all tabs with a space.  To replace the tabs with a space, you would first have to copy a tab character from your data, and then paste it into the search and replace window as the item to search for.  Then do a "Replace All".  After that, this would work:

 

data one;
infile 'C:\MyPath\SASHelpExample.txt' firstobs=2;
input Revenue_Amount Priority_Line $ Revenue_Year Revenue_Month Legacy_Customer_Numer SAP_Customer_Number;
run;
 
Ask a Question
Discussion stats
  • 6 replies
  • 126 views
  • 3 likes
  • 6 in conversation