DATA Step, Macro, Functions and more

How to read raw data line by line and store them into a single variable?

Reply
N/A
Posts: 0

How to read raw data line by line and store them into a single variable?

If I have a raw data like the following:

'aaaa' = 'aaaaabbbbbccccddd'
'bbbb' = 'aaabbbcccdddd'

I have over 2000 of such lines of data. I need to store them into a variable so that I can take out the duplicates with a proc sql distinct statement. Or is there a better way to remove duplicates?

thanks
Super Contributor
Posts: 474

Re: How to read raw data line by line and store them into a single variable?

Posted in reply to deleted_user
Two questions here.

First, to read raw data, just use the common file reading features of datastep.

See the online documentation,

INFILE statement: http://support.sas.com/documentation/cdl/en/lrdict/61724/HTML/default/a000146932.htm

INPUT statement: http://support.sas.com/documentation/cdl/en/imlug/59656/HTML/default/langref_sect141.htm

And yes, from my point of view, select distinct or proc sort nodupkey will be the best way to remove duplicates, which means for both, you'll have to sort first question 1.

Cheers from Portugal.

Daniel Santos @ www.cgd.pt.
N/A
Posts: 0

Re: How to read raw data line by line and store them into a single variable?

Posted in reply to DanielSantos
Hi,

Thanks for the input. The problem I am having right now is to store the entire line of data into one single variable. With the above format of the raw data, i can only store whatever is there before the first space.

thanks
Super Contributor
Posts: 474

Re: How to read raw data line by line and store them into a single variable?

Posted in reply to deleted_user
Have you tried using the infile buffer variable?

Something like this:

data _null_;
infile myfile.

input; * read one line;
put _infile_; * dispaly _infile_ buffer;

run;

Cheers from Portugal.

Daniel Santos @ www.cgd.pt.
N/A
Posts: 0

Re: How to read raw data line by line and store them into a single variable?

Posted in reply to DanielSantos
Hi,

Thanks again for the input. I read through some document about infile buffer varible. I am still not quite sure how it works. Especially in your example, is that suppose to be a period after the word myfile? The put _infile_ displays _infile_buffer and stores it to the variable myfile?

Also, I defined the lenght of the variable like the following:

input line $ 150.;

This actually read each line and stores the entire line into that variable line. This method works too right?

thanks
Super Contributor
Posts: 474

Re: How to read raw data line by line and store them into a single variable?

Posted in reply to deleted_user
Hi cosmid.

You are right, it is not supposed to be a period after the myfile. :-)
I am sorry, I've misspelled that.

When dealing with complex text parsing, I find always better to access the automatic _INFILE_ buffer variable. Being a buffer, there's no need to pre-alocate it's maximum size (as you should do with a variable) and It will hold precisely the exact record that was retrieved from the file.
Benefits for that? Say, I just want to parse the line and retrieve some 4 char code placed somewhere in the middle. Using _INFILE, there's no need to pre-alocate a "large-enough" variable to hold the line, I just have to process the _INFILE_ auto variable and extract what I need from it.

Check the paper of Howard Schreier about the _INFILE_ var:
http://www.nesug.org/Proceedings/nesug01/cc/cc4018bw.pdf

Now, unless each line as strictly 150 chars OR you specified the option TRUNCOVER in the INFILE statement, I wouldn't use:

input line $ 150.;

Instead, try this:

length LINE $150;
input LINE;

Here, the first 150 chars of each line are read to the LINE variable.
Of course, if no line has more than 150 chars, no truncation will occur and every line will be processed entirely.

Hope this helps.

Cheers from Portugal.

Daniel Santos @ www.cgd.pt
Ask a Question
Discussion stats
  • 5 replies
  • 636 views
  • 0 likes
  • 2 in conversation