BookmarkSubscribeRSS Feed
atijjani
Fluorite | Level 6

I have the following data in a tab delimited .dat file:

 

1   cord   BAD APPLES   bad apples are terrible for your health
2   logo   BAD APPLES   bad apples are terrible for
your health
3   cord   GOOD APPLES   bad apples are terrible for your health
4   brat    BAD APPLES   bad apples are terrible for
your health
5   brat    BAD APPLES   bad apples are terrible for your health
6   cord   BAD MANGOES   bad apples are terrible for your health
7   logo   BAD APPLES   bad apples are terrible for your health
8   cord   BAD APPLES   bad apples are terrible for your health


I want to read this data into SAS with the following variable names:
study_id type fruit_type description

Note that in some cases, an observation extends beyond on line (one record). The main problem i want to solve is how to tell SAS that when it encounters a point where the contents of the variable, "description" extends beyond one line, it should read the next line as part of the description variable.

I will highly appreciate a help on this.

Thank you.

 

9 REPLIES 9
ballardw
Super User

Are there actual line breaks such as carriage return or line feed such as from using an "Enter" key when typing? or is line just wrapping due to line length?

 

Or is this coming from Excel where people were entering data with  the Alt-Enter to make multiline entries?

 

Is there any notable pattern involved?

atijjani
Fluorite | Level 6

This is coming from a log file with no text wrapping. One notable pattern is that when ever the description field extends beyond one line, the next line starts with the string "PCP". The next line should have started with a number. So what SAS does everytime it encounter this line break is it sets the ID (which supposed to be a numerical variable to missing. 

 

Ksharp
Super User

Once you got table WANT,it is easy for you to get final result.

 (Edited)

 

data temp;
 infile 'c:\temp\x.txt' length=len lrecl=32767;
 input x $varying32767. len;
 pid=prxparse('/^\d+\s+/');
 if prxmatch(pid,x) then group+1;
run;

data want;
length want $ 400;
 do i=1 by 1 until(last.group);
  set temp;
  by group;
  want=cats(want,x);
 end;
 keep want;
run;
atijjani
Fluorite | Level 6

Thanks in the context of my data, what will the variables, cats, x,  and want be?

Ksharp
Super User

Get it from varaible WANT.

 

study_id=scan(want,1,'09'x);

fruit_type=scan(want,2,'09'x);

..............

atijjani
Fluorite | Level 6

Another issue is that i dont understand what is the relationship between the data temp, want, and my data. How will i link them?

 

Ksharp
Super User

TEMP contain the data from your txt file and make a group variable.

WANT is aligning these wrong rows and get the final result you need.
sanjay1
Obsidian | Level 7

Try this

 

data temp;
input study_id$ type$ frut_type$ description &$ 100.;
datalines;
cord BAD APPLES bad apples are terrible for your health
logo BAD APPLES bad apples are terrible for your health
cord GOOD APPLES bad apples are terrible for your health
brat BAD APPLES bad apples are terrible for your health
brat BAD APPLES bad apples are terrible for your health
cord BAD MANGOES bad apples are terrible for your health
logo BAD APPLES bad apples are terrible for your health
cord BAD APPLES bad apples are terrible for your health
;
run;

Patrick
Opal | Level 21

The following should do the job. You will have to point your infile statement to your external file and use as delimiter a tab instead of a comma - but else the code should pretty much do the job as posted.

I've used as test condition whether it's a new record of a flowover from the previous record, if there is a digit on position one of the record. You can of course also use some other test if that's not sufficient for your data (eg: is there a TAB in the input string?).

 

data sample;
  infile datalines dlm=',' truncover;

  /* read a new line of data and map against variables */
  input @1 study_id type:$10. fruit_type:$20. description:$100.;

  /* read the next line of data. Hold the input pointer on the current line */
  input @@;

  /* test if new observation or flowover from previous record           */
  /* test condition: first non blank character in row is not a digit    */
  _test= notdigit(substrn(left(_infile_),1,1));

  /* if it's a flowover then concatenate the remainder string to the description variable */
  /*   and then set input pointer to next row of source data */
  if _test then
    do;
      description=catx(' ',description,_infile_);
      input;
    end;

  datalines;
1,cord,BAD APPLES,bad apples are terrible for your health
2,logo,BAD APPLES,bad apples are terrible for
your health
3,cord,GOOD APPLES,bad apples are terrible for your health
4,brat,BAD APPLES,bad apples are terrible for
your health
5,brat,BAD APPLES,bad apples are terrible for your health
6,cord,BAD MANGOES,bad apples are terrible for your health
7,logo,BAD APPLES,bad apples are terrible for your health
8,cord,BAD APPLES,bad apples are terrible for your health
;
run;

input @@;

1. Read a new line of data into the input buffer

2. _infile_ that's an automatic variable which gives us access to the input buffer

3. @@ holds the input pointer on the current line even over the iteration of a data step

    - gets only released by another input statement. That's why this empty input statement is required if the test is true.

 

http://support.sas.com/documentation/cdl/en/lestmtsref/69738/HTML/default/viewer.htm#n0oaql83drile0n...

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 9 replies
  • 6142 views
  • 0 likes
  • 5 in conversation