DATA Step, Macro, Functions and more

How to read in a text file with special structure?

Reply
Occasional Contributor
Posts: 14

How to read in a text file with special structure?

Hey,

I am trying to read in a text file with some special structure as attached, where 1st column is VALUE, 2nd column is ID and 3rd column and beyond are features (up to 360). Can some one give me some suggestion on how to handle this?

Thanks.

Attachment
Frequent Contributor
Posts: 117

How to read in a text file with special structure?

Posted in reply to flyingsohigh

Hi,

I need some clarification that after 3rd column , are the features you mentioned are like columns or values for these columns..becoz i think in between again ID' s are coming up..

Regards,

vishnu

Super User
Posts: 5,516

How to read in a text file with special structure?

Posted in reply to flyingsohigh

flyingsohigh,

Besides fixing the data, you need to answer another question first.  What are you trying to create?  There are two basic structures to the final result.  Neither is right or wrong, and (if ID / VALUE combinations are unique) either form could be transformed into the other.  But which is more useful for you?

You could try to create 362 variables:  ID, VALUE, and FEATURE_001 - FEATURE_360, where some of the FEATURE variables are missing.

Or you could try to create 4 variables:  ID, VALUE, FEATURE_NUM (1 to 360), FEATURE, where there are up to 360 observations per incoming data line.

What's your choice?

Good luck.

Occasional Contributor
Posts: 14

Re: How to read in a text file with special structure?

Posted in reply to Astounding

I try to create 362 variables as specified in the input file. Thanks.

Super User
Super User
Posts: 7,076

Re: How to read in a text file with special structure?

Posted in reply to flyingsohigh

I think a very similar file structure was discussed on SAS-L in the last year or so.

Here is one method to make a vertical structure.  It throws away the 'qid' tag on the ID variable as it did not look like it adds value based on your sample data.  This works by setting the delimiter to both space and colon so that simple list mode input statements can be used.  It reads in the value and qid and then continues reading varnum/result pairs.  I added an ORDER variable so that you could recreated the original order on the line (if that makes any difference for you). 

data want ;

  infile cards dlm=' :' truncover ;

  input value dummy $ qid @;

  drop dummy ;

  do order=1 by 1 until (varnum=.) ;

    input varnum result @ ;

    if order=1 or varnum ne . then output;

  end;

cards;

2 qid:1 1:3 2:3 3:0 4:0 5:3 6:1 7:1 8:0 9:0 10:1 11:156 12:4 13:0 14:7 15:167 16:6.931275 17:22.076928 136:0

2 qid:1 1:3 2:0 3:3 4:0 5:3 6:1 7:0 8:1 9:0 10:1 11:406 12:0 13:5 14:5 15:416 16:6.931275 17:22.076928 18:19.673353 19:22.255383

0 qid:1 1:3 2:0 3:2 4:0 5:3 6:1 7:0 8:0.666667 9:0 10:1 11:146 12:0 13:3 14:7 15:156 16:6.931275 17:22.076928 18:19.673353 134:0 135:0 136:0

2 qid:2 1:3 2:0 3:3 4:0 5:3 6:1 7:0 8:1 9:0 10:1 11:287 12:1 13:4 14:7 15:299 16:6.931275 17:22.076928 18:19.673353 19:22.255383

1 qid:3 1:3 2:0 3:3 4:0 5:3 6:1 7:0 8:1 9:0 10:1 11:2009 12:2 13:4 14:7 15:2022 16:6.931275 17:22.076928 18:19.673353 19:22.255383 114:-25.436074

1 qid:1 1:3 2:0 3:3 4:0 5:3 6:1 7:0 8:1 9:0 10:1 11:935 12:3 13:4 14:7 15:949 16:6.931275 17:22.076928 18:19.673353 19:22.255383 20:6.926551 21:22

run;

Super User
Super User
Posts: 7,076

Re: How to read in a text file with special structure?

Posted in reply to flyingsohigh

To create the horizontal structure (with many missing values!).

data want ;

  infile cards dlm=' :' truncover ;

  input value dummy $ qid @;

  array v v1-v360;

  drop dummy varnum result ;

  do until (varnum=.) ;

    input varnum result @ ;

    if 1 <= varnum <=360 then v(varnum)=result;

    else if varnum ne . then put 'Invalid variable number: ' varnum= result=;

  end;

  output;

  call missing (of v1-v360);

cards;

2 qid:1 1:3 2:3 3:0 4:0 5:3 6:1 7:1 8:0 9:0 10:1 11:156 12:4 13:0 14:7 15:167 16:6.931275 17:22.076928 136:0

2 qid:1 1:3 2:0 3:3 4:0 5:3 6:1 7:0 8:1 9:0 10:1 11:406 12:0 13:5 14:5 15:416 16:6.931275 17:22.076928 18:19.673353 19:22.255383

0 qid:1 1:3 2:0 3:2 4:0 5:3 6:1 7:0 8:0.666667 9:0 10:1 11:146 12:0 13:3 14:7 15:156 16:6.931275 17:22.076928 18:19.673353 134:0 135:0 136:0

2 qid:2 1:3 2:0 3:3 4:0 5:3 6:1 7:0 8:1 9:0 10:1 11:287 12:1 13:4 14:7 15:299 16:6.931275 17:22.076928 18:19.673353 19:22.255383

1 qid:3 1:3 2:0 3:3 4:0 5:3 6:1 7:0 8:1 9:0 10:1 11:2009 12:2 13:4 14:7 15:2022 16:6.931275 17:22.076928 18:19.673353 19:22.255383 114:-25.436074

1 qid:1 1:3 2:0 3:3 4:0 5:3 6:1 7:0 8:1 9:0 10:1 11:935 12:3 13:4 14:7 15:949 16:6.931275 17:22.076928 18:19.673353 19:22.255383 20:6.926551 21:22

run;

Super User
Posts: 5,516

Re: How to read in a text file with special structure?

Tom's latest program looks perfect.  A couple of bits and pieces ...

Just to split hairs, you can eliminate the CALL MISSING statement, since all variables will start off missing for each observation.  Once you do that,  you can eliminate the OUTPUT statement as well.

And it wouldn't hurt to review your expectations about being able to uniquely identify observations.  Should the combination of VALUE and RESULT uniquely identify an observation, or are duplicates permitted?

Anyway, you're good to go.

Ask a Question
Discussion stats
  • 6 replies
  • 212 views
  • 0 likes
  • 4 in conversation