BookmarkSubscribeRSS Feed
flyingsohigh
Calcite | Level 5

Hey,

I am trying to read in a text file with some special structure as attached, where 1st column is VALUE, 2nd column is ID and 3rd column and beyond are features (up to 360). Can some one give me some suggestion on how to handle this?

Thanks.

6 REPLIES 6
Vish33
Lapis Lazuli | Level 10

Hi,

I need some clarification that after 3rd column , are the features you mentioned are like columns or values for these columns..becoz i think in between again ID' s are coming up..

Regards,

vishnu

Astounding
PROC Star

flyingsohigh,

Besides fixing the data, you need to answer another question first.  What are you trying to create?  There are two basic structures to the final result.  Neither is right or wrong, and (if ID / VALUE combinations are unique) either form could be transformed into the other.  But which is more useful for you?

You could try to create 362 variables:  ID, VALUE, and FEATURE_001 - FEATURE_360, where some of the FEATURE variables are missing.

Or you could try to create 4 variables:  ID, VALUE, FEATURE_NUM (1 to 360), FEATURE, where there are up to 360 observations per incoming data line.

What's your choice?

Good luck.

flyingsohigh
Calcite | Level 5

I try to create 362 variables as specified in the input file. Thanks.

Tom
Super User Tom
Super User

I think a very similar file structure was discussed on SAS-L in the last year or so.

Here is one method to make a vertical structure.  It throws away the 'qid' tag on the ID variable as it did not look like it adds value based on your sample data.  This works by setting the delimiter to both space and colon so that simple list mode input statements can be used.  It reads in the value and qid and then continues reading varnum/result pairs.  I added an ORDER variable so that you could recreated the original order on the line (if that makes any difference for you). 

data want ;

  infile cards dlm=' :' truncover ;

  input value dummy $ qid @;

  drop dummy ;

  do order=1 by 1 until (varnum=.) ;

    input varnum result @ ;

    if order=1 or varnum ne . then output;

  end;

cards;

2 qid:1 1:3 2:3 3:0 4:0 5:3 6:1 7:1 8:0 9:0 10:1 11:156 12:4 13:0 14:7 15:167 16:6.931275 17:22.076928 136:0

2 qid:1 1:3 2:0 3:3 4:0 5:3 6:1 7:0 8:1 9:0 10:1 11:406 12:0 13:5 14:5 15:416 16:6.931275 17:22.076928 18:19.673353 19:22.255383

0 qid:1 1:3 2:0 3:2 4:0 5:3 6:1 7:0 8:0.666667 9:0 10:1 11:146 12:0 13:3 14:7 15:156 16:6.931275 17:22.076928 18:19.673353 134:0 135:0 136:0

2 qid:2 1:3 2:0 3:3 4:0 5:3 6:1 7:0 8:1 9:0 10:1 11:287 12:1 13:4 14:7 15:299 16:6.931275 17:22.076928 18:19.673353 19:22.255383

1 qid:3 1:3 2:0 3:3 4:0 5:3 6:1 7:0 8:1 9:0 10:1 11:2009 12:2 13:4 14:7 15:2022 16:6.931275 17:22.076928 18:19.673353 19:22.255383 114:-25.436074

1 qid:1 1:3 2:0 3:3 4:0 5:3 6:1 7:0 8:1 9:0 10:1 11:935 12:3 13:4 14:7 15:949 16:6.931275 17:22.076928 18:19.673353 19:22.255383 20:6.926551 21:22

run;

Tom
Super User Tom
Super User

To create the horizontal structure (with many missing values!).

data want ;

  infile cards dlm=' :' truncover ;

  input value dummy $ qid @;

  array v v1-v360;

  drop dummy varnum result ;

  do until (varnum=.) ;

    input varnum result @ ;

    if 1 <= varnum <=360 then v(varnum)=result;

    else if varnum ne . then put 'Invalid variable number: ' varnum= result=;

  end;

  output;

  call missing (of v1-v360);

cards;

2 qid:1 1:3 2:3 3:0 4:0 5:3 6:1 7:1 8:0 9:0 10:1 11:156 12:4 13:0 14:7 15:167 16:6.931275 17:22.076928 136:0

2 qid:1 1:3 2:0 3:3 4:0 5:3 6:1 7:0 8:1 9:0 10:1 11:406 12:0 13:5 14:5 15:416 16:6.931275 17:22.076928 18:19.673353 19:22.255383

0 qid:1 1:3 2:0 3:2 4:0 5:3 6:1 7:0 8:0.666667 9:0 10:1 11:146 12:0 13:3 14:7 15:156 16:6.931275 17:22.076928 18:19.673353 134:0 135:0 136:0

2 qid:2 1:3 2:0 3:3 4:0 5:3 6:1 7:0 8:1 9:0 10:1 11:287 12:1 13:4 14:7 15:299 16:6.931275 17:22.076928 18:19.673353 19:22.255383

1 qid:3 1:3 2:0 3:3 4:0 5:3 6:1 7:0 8:1 9:0 10:1 11:2009 12:2 13:4 14:7 15:2022 16:6.931275 17:22.076928 18:19.673353 19:22.255383 114:-25.436074

1 qid:1 1:3 2:0 3:3 4:0 5:3 6:1 7:0 8:1 9:0 10:1 11:935 12:3 13:4 14:7 15:949 16:6.931275 17:22.076928 18:19.673353 19:22.255383 20:6.926551 21:22

run;

Astounding
PROC Star

Tom's latest program looks perfect.  A couple of bits and pieces ...

Just to split hairs, you can eliminate the CALL MISSING statement, since all variables will start off missing for each observation.  Once you do that,  you can eliminate the OUTPUT statement as well.

And it wouldn't hurt to review your expectations about being able to uniquely identify observations.  Should the combination of VALUE and RESULT uniquely identify an observation, or are duplicates permitted?

Anyway, you're good to go.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 951 views
  • 0 likes
  • 4 in conversation