Help using Base SAS procedures

Help with Data input

Reply
Occasional Contributor
Posts: 8

Help with Data input

Can someone please help me input this style of space delimited txt file into SAS 9.2? Its a PLINK output. PLINK: Whole genome data analysis toolset</title></head><!--<html>--><!--<title>PLINK

Ive been trying to use proc import without luck.

proc import datafile="C:\Users\Adam Bress\Downloads\plink.assoc.linear.txt" out=gwas dbms=dlm replace;                                 

         datarow=2;                                                                                                                    

        getnames=yes; 

run;

Ive attached part of the data.

The data looks like this. Its a space delimited txt file. About 4 million observations.

CHR          SNP         BP   A1       TEST    NMISS       BETA         STAT            P

   1   rs28659788     713170    G        ADD      156     -3.549      -0.8538       0.3946

   1   rs28659788     713170    G        Age      156  -0.006926      -0.1199       0.9048

   1   rs28659788     713170    G         UV      156     -1.725       -1.369       0.1729

   1   rs28659788     713170    G         c1      156      10.05        2.434      0.01611

   1    rs3094315     742429    T        ADD      160    -0.3026      -0.3673       0.7139

   1    rs3094315     742429    T        Age      160   -0.02911      -0.5245       0.6007

   1    rs3094315     742429    T         UV      160     -1.579       -1.244       0.2153

   1    rs3094315     742429    T         c1      160       10.2        2.459      0.01502

   1    rs3131972     742584    C        ADD      160    -0.5456      -0.6454       0.5196

   1    rs3131972     742584    C        Age      160   -0.02799      -0.5048       0.6144

   1    rs3131972     742584    C         UV      160     -1.514       -1.189       0.2361

   1    rs3131972     742584    C         c1      160      10.18         2.46      0.01501

   1    rs3131969     744045    C        ADD      156    -0.6604      -0.7913         0.43

   1    rs3131969     744045    C        Age      156   -0.02938      -0.5326       0.5951

   1    rs3131969     744045    C         UV      156     -1.203      -0.9396       0.3489

   1    rs3131969     744045    C         c1      156      9.547        2.328      0.02123

   1   rs12562034     758311    A        ADD      157     -1.785       -1.208       0.2289

   1   rs12562034     758311    A        Age      157   -0.02333      -0.4124       0.6807

   1   rs12562034     758311    A         UV      157     -1.495        -1.17        0.244

   1   rs12562034     758311    A         c1      157      10.14        2.438      0.01593

   1   rs12124819     766409    G        ADD      160     -1.021      -0.5403       0.5898

   1   rs12124819     766409    G        Age      160   -0.02723      -0.4906       0.6244

   1   rs12124819     766409    G         UV      160     -1.678       -1.331       0.1852

Attachment
Super User
Posts: 10,500

Re: Help with Data input

If you have errors in the log post them.

What do you mean without luck? No data set or contents wrong/missing/ unexpected?

If the problem is unexpected output data types try adding

GuessingRows=32767


Occasional Contributor
Posts: 8

Re: Help with Data input

Thanks for the reply.

The error i get is below (Its very long)

Thanks in advance for your help.

634  proc import datafile="C:\Users\Adam Bress\Downloads\plink.assoc.linear.txt" out=gwas dbms=dlm replace;

635           datarow=2;

636          getnames=yes;

637  run;

Number of names found is greater than number of variables found.

Name   is not a valid SAS name.

Name   is not a valid SAS name.

Name   is not a valid SAS name.

Name   is not a valid SAS name.

Name   is not a valid SAS name.

Name   is not a valid SAS name.

Name   is not a valid SAS name.

Name   is not a valid SAS name.

Name   is not a valid SAS name.

Name   is not a valid SAS name.

Name   is not a valid SAS name.

Name   is not a valid SAS name.

Name   is not a valid SAS name.

Name   is not a valid SAS name.

Name   is not a valid SAS name.

Name   is not a valid SAS name.

Name   is not a valid SAS name.

Name   is not a valid SAS name.

Name   is not a valid SAS name.

Name   is not a valid SAS name.

Name   is not a valid SAS name.

Name   is not a valid SAS name.

Name   is not a valid SAS name.

Name   is not a valid SAS name.

Name   is not a valid SAS name.

Name   is not a valid SAS name.

Name   is not a valid SAS name.

Name   is not a valid SAS name.

Name   is not a valid SAS name.

Name   is not a valid SAS name.

Name   is not a valid SAS name.

Name   is not a valid SAS name.

Name   is not a valid SAS name.

Name   is not a valid SAS name.

Name   is not a valid SAS name.

Name   is not a valid SAS name.

Name   is not a valid SAS name.

Name   is not a valid SAS name.

Name   is not a valid SAS name.

Name   is not a valid SAS name.

Name   is not a valid SAS name.

Name   is not a valid SAS name.

Name   is not a valid SAS name.

Name   is not a valid SAS name.

Name   is not a valid SAS name.

Problems were detected with provided names.  See LOG.

638   /**********************************************************************

639   *   PRODUCT:   SAS

640   *   VERSION:   9.2

641   *   CREATOR:   External File Interface

642   *   DATE:      07MAY13

643   *   DESC:      Generated SAS Datastep Code

644   *   TEMPLATE SOURCE:  (None Specified.)

645   ***********************************************************************/

646      data WORK.GWAS                                    ;

647      %let _EFIERR_ = 0; /* set the ERROR detection macro variable */

648      infile 'C:\Users\Adam Bress\Downloads\plink.assoc.linear.txt' delimiter = ' ' MISSOVER DSD lrecl=32767 firstobs=2 ;

649         informat VAR1 $1. ;

650         informat CHR $1. ;

651         informat VAR3 $1. ;

652         informat VAR4 best32. ;

653         informat VAR5 $1. ;

654         informat VAR6 $1. ;

655         informat VAR7 $10. ;

656         informat VAR8 $12. ;

657         informat VAR9 $1. ;

658         informat VAR10 $1. ;

659         informat VAR11 $1. ;

660         informat SNP best32. ;

661         informat VAR13 best32. ;

662         informat VAR14 $1. ;

663         informat VAR15 $1. ;

664         informat VAR16 $1. ;

665         informat VAR17 $1. ;

666         informat VAR18 $1. ;

667         informat VAR19 $1. ;

668         informat VAR20 $1. ;

669         informat BP $1. ;

670         informat VAR22 $1. ;

671         informat VAR23 $1. ;

672         informat A1 $3. ;

673         informat VAR25 $4. ;

674         informat VAR26 $3. ;

675         informat VAR27 $1. ;

676         informat VAR28 $1. ;

677         informat VAR29 $1. ;

678         informat VAR30 best32. ;

679         informat TEST best32. ;

680         informat VAR32 best32. ;

681         informat VAR33 best32. ;

682         informat VAR34 best32. ;

683         informat NMISS best32. ;

684         informat VAR36 best32. ;

685         informat VAR37 best32. ;

686         informat VAR38 best32. ;

687         informat VAR39 best32. ;

688         informat VAR40 best32. ;

689         informat VAR41 best32. ;

690         informat BETA best32. ;

691         informat VAR43 best32. ;

692         informat VAR44 best32. ;

693         informat VAR45 best32. ;

694         informat VAR46 best32. ;

695         informat VAR47 best32. ;

696         informat VAR48 best32. ;

697         informat VAR49 best32. ;

698         informat VAR50 best32. ;

699         informat STAT best32. ;

700         informat VAR52 best32. ;

701         informat VAR53 best32. ;

702         format VAR1 $1. ;

703         format CHR $1. ;

704         format VAR3 $1. ;

705         format VAR4 best12. ;

706         format VAR5 $1. ;

707         format VAR6 $1. ;

708         format VAR7 $10. ;

709         format VAR8 $12. ;

710         format VAR9 $1. ;

711         format VAR10 $1. ;

712         format VAR11 $1. ;

713         format SNP best12. ;

714         format VAR13 best12. ;

715         format VAR14 $1. ;

716         format VAR15 $1. ;

717         format VAR16 $1. ;

718         format VAR17 $1. ;

719         format VAR18 $1. ;

720         format VAR19 $1. ;

721         format VAR20 $1. ;

722         format BP $1. ;

723         format VAR22 $1. ;

724         format VAR23 $1. ;

725         format A1 $3. ;

726         format VAR25 $4. ;

727         format VAR26 $3. ;

728         format VAR27 $1. ;

729         format VAR28 $1. ;

730         format VAR29 $1. ;

731         format VAR30 best12. ;

732         format TEST best12. ;

733         format VAR32 best12. ;

734         format VAR33 best12. ;

735         format VAR34 best12. ;

736         format NMISS best12. ;

737         format VAR36 best12. ;

738         format VAR37 best12. ;

739         format VAR38 best12. ;

740         format VAR39 best12. ;

741         format VAR40 best12. ;

742         format VAR41 best12. ;

743         format BETA best12. ;

744         format VAR43 best12. ;

745         format VAR44 best12. ;

746         format VAR45 best12. ;

747         format VAR46 best12. ;

748         format VAR47 best12. ;

749         format VAR48 best12. ;

750         format VAR49 best12. ;

751         format VAR50 best12. ;

752         format STAT best12. ;

753         format VAR52 best12. ;

754         format VAR53 best12. ;

755      input

756                  VAR1 $

757                  CHR $

758                  VAR3 $

759                  VAR4

760                  VAR5 $

761                  VAR6 $

762                  VAR7 $

763                  VAR8 $

764                  VAR9 $

765                  VAR10 $

766                  VAR11 $

767                  SNP

768                  VAR13

769                  VAR14 $

770                  VAR15 $

771                  VAR16 $

772                  VAR17 $

773                  VAR18 $

774                  VAR19 $

775                  VAR20 $

776                  BP $

777                  VAR22 $

778                  VAR23 $

779                  A1 $

780                  VAR25 $

781                  VAR26 $

782                  VAR27 $

783                  VAR28 $

784                  VAR29 $

785                  VAR30

786                  TEST

787                  VAR32

788                  VAR33

789                  VAR34

790                  NMISS

791                  VAR36

792                  VAR37

793                  VAR38

794                  VAR39

795                  VAR40

796                  VAR41

797                  BETA

798                  VAR43

799                  VAR44

800                  VAR45

801                  VAR46

802                  VAR47

803                  VAR48

804                  VAR49

805                  VAR50

806                  STAT

807                  VAR52

808                  VAR53

809      ;

810      if _ERROR_ then call symputx('_EFIERR_',1);  /* set ERROR detection macro variable */

811      run;

NOTE: The infile 'C:\Users\Adam Bress\Downloads\plink.assoc.linear.txt' is:

      Filename=C:\Users\Adam Bress\Downloads\plink.assoc.linear.txt,

      RECFM=V,LRECL=32767,

      File Size (bytes)=381808598,

      Last Modified=07May2013:15:53:16,

      Create Time=07May2013:15:53:10

NOTE: Invalid data for VAR38 in line 6266 63-64.

NOTE: Invalid data for VAR49 in line 6266 76-77.

RULE:     ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9----+----0

6266         1   rs10489133    4573797    0        ADD      160         NA           NA           NA 90

VAR1=  CHR=  VAR3=  VAR4=1 VAR5=  VAR6=  VAR7=rs10489133 VAR8=  VAR9=  VAR10=  VAR11=4 SNP=. VAR13=. VAR14=  VAR15=0 VAR16=

VAR17=  VAR18=  VAR19=  VAR20=  BP=  VAR22=  VAR23=A A1=  VAR25=  VAR26=  VAR27=  VAR28=  VAR29=1 VAR30=. TEST=. VAR32=. VAR33=.

VAR34=. NMISS=. VAR36=. VAR37=. VAR38=. VAR39=. VAR40=. VAR41=. BETA=. VAR43=. VAR44=. VAR45=. VAR46=. VAR47=. VAR48=. VAR49=.

VAR50=. STAT=. VAR52=. VAR53=. _ERROR_=1 _N_=6265

NOTE: Invalid data for VAR38 in line 6267 63-64.

NOTE: Invalid data for VAR49 in line 6267 76-77.

6267         1   rs10489133    4573797    0        Age      160         NA           NA           NA 90

VAR1=  CHR=  VAR3=  VAR4=1 VAR5=  VAR6=  VAR7=rs10489133 VAR8=  VAR9=  VAR10=  VAR11=4 SNP=. VAR13=. VAR14=  VAR15=0 VAR16=

VAR17=  VAR18=  VAR19=  VAR20=  BP=  VAR22=  VAR23=A A1=  VAR25=  VAR26=  VAR27=  VAR28=  VAR29=1 VAR30=. TEST=. VAR32=. VAR33=.

VAR34=. NMISS=. VAR36=. VAR37=. VAR38=. VAR39=. VAR40=. VAR41=. BETA=. VAR43=. VAR44=. VAR45=. VAR46=. VAR47=. VAR48=. VAR49=.

VAR50=. STAT=. VAR52=. VAR53=. _ERROR_=1 _N_=6266

NOTE: Invalid data for VAR39 in line 6268 63-64.

NOTE: Invalid data for VAR50 in line 6268 76-77.

6268         1   rs10489133    4573797    0         UV      160         NA           NA           NA 90

VAR1=  CHR=  VAR3=  VAR4=1 VAR5=  VAR6=  VAR7=rs10489133 VAR8=  VAR9=  VAR10=  VAR11=4 SNP=. VAR13=. VAR14=  VAR15=0 VAR16=

VAR17=  VAR18=  VAR19=  VAR20=  BP=  VAR22=  VAR23=  A1=UV VAR25=  VAR26=  VAR27=  VAR28=  VAR29=  VAR30=160 TEST=. VAR32=.

VAR33=. VAR34=. NMISS=. VAR36=. VAR37=. VAR38=. VAR39=. VAR40=. VAR41=. BETA=. VAR43=. VAR44=. VAR45=. VAR46=. VAR47=. VAR48=.

VAR49=. VAR50=. STAT=. VAR52=. VAR53=. _ERROR_=1 _N_=6267

NOTE: Invalid data for VAR39 in line 6269 63-64.

NOTE: Invalid data for VAR50 in line 6269 76-77.

6269         1   rs10489133    4573797    0         c1      160         NA           NA           NA 90

VAR1=  CHR=  VAR3=  VAR4=1 VAR5=  VAR6=  VAR7=rs10489133 VAR8=  VAR9=  VAR10=  VAR11=4 SNP=. VAR13=. VAR14=  VAR15=0 VAR16=

VAR17=  VAR18=  VAR19=  VAR20=  BP=  VAR22=  VAR23=  A1=c1 VAR25=  VAR26=  VAR27=  VAR28=  VAR29=  VAR30=160 TEST=. VAR32=.

VAR33=. VAR34=. NMISS=. VAR36=. VAR37=. VAR38=. VAR39=. VAR40=. VAR41=. BETA=. VAR43=. VAR44=. VAR45=. VAR46=. VAR47=. VAR48=.

VAR49=. VAR50=. STAT=. VAR52=. VAR53=. _ERROR_=1 _N_=6268

NOTE: Invalid data for VAR38 in line 10186 63-64.

NOTE: Invalid data for VAR49 in line 10186 76-77.

10186        1   rs12065517    6588123    0        ADD      156         NA           NA           NA 90

VAR1=  CHR=  VAR3=  VAR4=1 VAR5=  VAR6=  VAR7=rs12065517 VAR8=  VAR9=  VAR10=  VAR11=6 SNP=. VAR13=. VAR14=  VAR15=0 VAR16=

VAR17=  VAR18=  VAR19=  VAR20=  BP=  VAR22=  VAR23=A A1=  VAR25=  VAR26=  VAR27=  VAR28=  VAR29=1 VAR30=. TEST=. VAR32=. VAR33=.

VAR34=. NMISS=. VAR36=. VAR37=. VAR38=. VAR39=. VAR40=. VAR41=. BETA=. VAR43=. VAR44=. VAR45=. VAR46=. VAR47=. VAR48=. VAR49=.

VAR50=. STAT=. VAR52=. VAR53=. _ERROR_=1 _N_=10185

NOTE: Invalid data for VAR38 in line 10187 63-64.

NOTE: Invalid data for VAR49 in line 10187 76-77.

10187        1   rs12065517    6588123    0        Age      156         NA           NA           NA 90

VAR1=  CHR=  VAR3=  VAR4=1 VAR5=  VAR6=  VAR7=rs12065517 VAR8=  VAR9=  VAR10=  VAR11=6 SNP=. VAR13=. VAR14=  VAR15=0 VAR16=

VAR17=  VAR18=  VAR19=  VAR20=  BP=  VAR22=  VAR23=A A1=  VAR25=  VAR26=  VAR27=  VAR28=  VAR29=1 VAR30=. TEST=. VAR32=. VAR33=.

VAR34=. NMISS=. VAR36=. VAR37=. VAR38=. VAR39=. VAR40=. VAR41=. BETA=. VAR43=. VAR44=. VAR45=. VAR46=. VAR47=. VAR48=. VAR49=.

VAR50=. STAT=. VAR52=. VAR53=. _ERROR_=1 _N_=10186

NOTE: Invalid data for VAR39 in line 10188 63-64.

NOTE: Invalid data for VAR50 in line 10188 76-77.

10188        1   rs12065517    6588123    0         UV      156         NA           NA           NA 90

VAR1=  CHR=  VAR3=  VAR4=1 VAR5=  VAR6=  VAR7=rs12065517 VAR8=  VAR9=  VAR10=  VAR11=6 SNP=. VAR13=. VAR14=  VAR15=0 VAR16=

VAR17=  VAR18=  VAR19=  VAR20=  BP=  VAR22=  VAR23=  A1=UV VAR25=  VAR26=  VAR27=  VAR28=  VAR29=  VAR30=156 TEST=. VAR32=.

VAR33=. VAR34=. NMISS=. VAR36=. VAR37=. VAR38=. VAR39=. VAR40=. VAR41=. BETA=. VAR43=. VAR44=. VAR45=. VAR46=. VAR47=. VAR48=.

VAR49=. VAR50=. STAT=. VAR52=. VAR53=. _ERROR_=1 _N_=10187

NOTE: Invalid data for VAR39 in line 10189 63-64.

NOTE: Invalid data for VAR50 in line 10189 76-77.

10189        1   rs12065517    6588123    0         c1      156         NA           NA           NA 90

VAR1=  CHR=  VAR3=  VAR4=1 VAR5=  VAR6=  VAR7=rs12065517 VAR8=  VAR9=  VAR10=  VAR11=6 SNP=. VAR13=. VAR14=  VAR15=0 VAR16=

VAR17=  VAR18=  VAR19=  VAR20=  BP=  VAR22=  VAR23=  A1=c1 VAR25=  VAR26=  VAR27=  VAR28=  VAR29=  VAR30=156 TEST=. VAR32=.

VAR33=. VAR34=. NMISS=. VAR36=. VAR37=. VAR38=. VAR39=. VAR40=. VAR41=. BETA=. VAR43=. VAR44=. VAR45=. VAR46=. VAR47=. VAR48=.

VAR49=. VAR50=. STAT=. VAR52=. VAR53=. _ERROR_=1 _N_=10188

NOTE: Invalid data for VAR37 in line 20342 63-64.

NOTE: Invalid data for VAR48 in line 20342 76-77.

20342        1   rs17039265   12835088    0        ADD      160         NA           NA           NA 90

VAR1=  CHR=  VAR3=  VAR4=1 VAR5=  VAR6=  VAR7=rs17039265 VAR8=  VAR9=  VAR10=1 VAR11=  SNP=. VAR13=. VAR14=0 VAR15=  VAR16=

VAR17=  VAR18=  VAR19=  VAR20=  BP=  VAR22=A VAR23=  A1=  VAR25=  VAR26=  VAR27=  VAR28=1 VAR29=  VAR30=. TEST=. VAR32=. VAR33=.

VAR34=. NMISS=. VAR36=. VAR37=. VAR38=. VAR39=. VAR40=. VAR41=. BETA=. VAR43=. VAR44=. VAR45=. VAR46=. VAR47=. VAR48=. VAR49=.

VAR50=. STAT=. VAR52=. VAR53=. _ERROR_=1 _N_=20341

NOTE: Invalid data for VAR37 in line 20343 63-64.

NOTE: Invalid data for VAR48 in line 20343 76-77.

20343        1   rs17039265   12835088    0        Age      160         NA           NA           NA 90

VAR1=  CHR=  VAR3=  VAR4=1 VAR5=  VAR6=  VAR7=rs17039265 VAR8=  VAR9=  VAR10=1 VAR11=  SNP=. VAR13=. VAR14=0 VAR15=  VAR16=

VAR17=  VAR18=  VAR19=  VAR20=  BP=  VAR22=A VAR23=  A1=  VAR25=  VAR26=  VAR27=  VAR28=1 VAR29=  VAR30=. TEST=. VAR32=. VAR33=.

VAR34=. NMISS=. VAR36=. VAR37=. VAR38=. VAR39=. VAR40=. VAR41=. BETA=. VAR43=. VAR44=. VAR45=. VAR46=. VAR47=. VAR48=. VAR49=.

VAR50=. STAT=. VAR52=. VAR53=. _ERROR_=1 _N_=20342

NOTE: Invalid data for VAR38 in line 20344 63-64.

NOTE: Invalid data for VAR49 in line 20344 76-77.

20344        1   rs17039265   12835088    0         UV      160         NA           NA           NA 90

VAR1=  CHR=  VAR3=  VAR4=1 VAR5=  VAR6=  VAR7=rs17039265 VAR8=  VAR9=  VAR10=1 VAR11=  SNP=. VAR13=. VAR14=0 VAR15=  VAR16=

VAR17=  VAR18=  VAR19=  VAR20=  BP=  VAR22=  VAR23=U A1=  VAR25=  VAR26=  VAR27=  VAR28=  VAR29=1 VAR30=. TEST=. VAR32=. VAR33=.

VAR34=. NMISS=. VAR36=. VAR37=. VAR38=. VAR39=. VAR40=. VAR41=. BETA=. VAR43=. VAR44=. VAR45=. VAR46=. VAR47=. VAR48=. VAR49=.

VAR50=. STAT=. VAR52=. VAR53=. _ERROR_=1 _N_=20343

NOTE: Invalid data for VAR38 in line 20345 63-64.

NOTE: Invalid data for VAR49 in line 20345 76-77.

20345        1   rs17039265   12835088    0         c1      160         NA           NA           NA 90

VAR1=  CHR=  VAR3=  VAR4=1 VAR5=  VAR6=  VAR7=rs17039265 VAR8=  VAR9=  VAR10=1 VAR11=  SNP=. VAR13=. VAR14=0 VAR15=  VAR16=

VAR17=  VAR18=  VAR19=  VAR20=  BP=  VAR22=  VAR23=c A1=  VAR25=  VAR26=  VAR27=  VAR28=  VAR29=1 VAR30=. TEST=. VAR32=. VAR33=.

VAR34=. NMISS=. VAR36=. VAR37=. VAR38=. VAR39=. VAR40=. VAR41=. BETA=. VAR43=. VAR44=. VAR45=. VAR46=. VAR47=. VAR48=. VAR49=.

VAR50=. STAT=. VAR52=. VAR53=. _ERROR_=1 _N_=20344

NOTE: Invalid data for SNP in line 23674 13-17.

23674        1        rs549   15419412    A        ADD      159   -0.05494      -0.0698       0.9444 90

VAR1=  CHR=  VAR3=  VAR4=1 VAR5=  VAR6=  VAR7=  VAR8=  VAR9=  VAR10=  VAR11=  SNP=. VAR13=. VAR14=  VAR15=1 VAR16=  VAR17=

VAR18=  VAR19=A VAR20=  BP=  VAR22=  VAR23=  A1=  VAR25=  VAR26=  VAR27=A VAR28=  VAR29=  VAR30=. TEST=. VAR32=. VAR33=159

VAR34=. NMISS=. VAR36=-0.05494 VAR37=. VAR38=. VAR39=. VAR40=. VAR41=. BETA=-0.0698 VAR43=. VAR44=. VAR45=. VAR46=. VAR47=.

VAR48=. VAR49=0.9444 VAR50=. STAT=. VAR52=. VAR53=. _ERROR_=1 _N_=23673

NOTE: Invalid data for SNP in line 23675 13-17.

23675        1        rs549   15419412    A        Age      159   -0.03094      -0.5468       0.5853 90

VAR1=  CHR=  VAR3=  VAR4=1 VAR5=  VAR6=  VAR7=  VAR8=  VAR9=  VAR10=  VAR11=  SNP=. VAR13=. VAR14=  VAR15=1 VAR16=  VAR17=

VAR18=  VAR19=A VAR20=  BP=  VAR22=  VAR23=  A1=  VAR25=  VAR26=  VAR27=A VAR28=  VAR29=  VAR30=. TEST=. VAR32=. VAR33=159

VAR34=. NMISS=. VAR36=-0.03094 VAR37=. VAR38=. VAR39=. VAR40=. VAR41=. BETA=-0.5468 VAR43=. VAR44=. VAR45=. VAR46=. VAR47=.

VAR48=. VAR49=0.5853 VAR50=. STAT=. VAR52=. VAR53=. _ERROR_=1 _N_=23674

NOTE: Invalid data for SNP in line 23676 13-17.

23676        1        rs549   15419412    A         UV      159     -1.662       -1.304       0.1943 90

VAR1=  CHR=  VAR3=  VAR4=1 VAR5=  VAR6=  VAR7=  VAR8=  VAR9=  VAR10=  VAR11=  SNP=. VAR13=. VAR14=  VAR15=1 VAR16=  VAR17=

VAR18=  VAR19=A VAR20=  BP=  VAR22=  VAR23=  A1=  VAR25=  VAR26=  VAR27=  VAR28=U VAR29=  VAR30=. TEST=. VAR32=. VAR33=.

VAR34=159 NMISS=. VAR36=. VAR37=. VAR38=. VAR39=-1.662 VAR40=. VAR41=. BETA=. VAR43=. VAR44=. VAR45=. VAR46=-1.304 VAR47=.

VAR48=. VAR49=. VAR50=. STAT=. VAR52=. VAR53=0.1943 _ERROR_=1 _N_=23675

NOTE: Invalid data for SNP in line 23677 13-17.

23677        1        rs549   15419412    A         c1      159      10.13        2.434      0.01609 90

VAR1=  CHR=  VAR3=  VAR4=1 VAR5=  VAR6=  VAR7=  VAR8=  VAR9=  VAR10=  VAR11=  SNP=. VAR13=. VAR14=  VAR15=1 VAR16=  VAR17=

VAR18=  VAR19=A VAR20=  BP=  VAR22=  VAR23=  A1=  VAR25=  VAR26=  VAR27=  VAR28=c VAR29=  VAR30=. TEST=. VAR32=. VAR33=.

VAR34=159 NMISS=. VAR36=. VAR37=. VAR38=. VAR39=. VAR40=10.13 VAR41=. BETA=. VAR43=. VAR44=. VAR45=. VAR46=. VAR47=. VAR48=2.434

VAR49=. VAR50=. STAT=. VAR52=. VAR53=. _ERROR_=1 _N_=23676

NOTE: Invalid data for VAR38 in line 31730 63-64.

NOTE: Invalid data for VAR49 in line 31730 76-77.

31730        1    rs2236772   20177372    0        ADD      160         NA           NA           NA 90

VAR1=  CHR=  VAR3=  VAR4=1 VAR5=  VAR6=  VAR7=  VAR8=rs2236772 VAR9=  VAR10=  VAR11=2 SNP=. VAR13=. VAR14=  VAR15=0 VAR16=

VAR17=  VAR18=  VAR19=  VAR20=  BP=  VAR22=  VAR23=A A1=  VAR25=  VAR26=  VAR27=  VAR28=  VAR29=1 VAR30=. TEST=. VAR32=. VAR33=.

VAR34=. NMISS=. VAR36=. VAR37=. VAR38=. VAR39=. VAR40=. VAR41=. BETA=. VAR43=. VAR44=. VAR45=. VAR46=. VAR47=. VAR48=. VAR49=.

VAR50=. STAT=. VAR52=. VAR53=. _ERROR_=1 _N_=31729

NOTE: Invalid data for VAR38 in line 31731 63-64.

NOTE: Invalid data for VAR49 in line 31731 76-77.

31731        1    rs2236772   20177372    0        Age      160         NA           NA           NA 90

VAR1=  CHR=  VAR3=  VAR4=1 VAR5=  VAR6=  VAR7=  VAR8=rs2236772 VAR9=  VAR10=  VAR11=2 SNP=. VAR13=. VAR14=  VAR15=0 VAR16=

VAR17=  VAR18=  VAR19=  VAR20=  BP=  VAR22=  VAR23=A A1=  VAR25=  VAR26=  VAR27=  VAR28=  VAR29=1 VAR30=. TEST=. VAR32=. VAR33=.

VAR34=. NMISS=. VAR36=. VAR37=. VAR38=. VAR39=. VAR40=. VAR41=. BETA=. VAR43=. VAR44=. VAR45=. VAR46=. VAR47=. VAR48=. VAR49=.

VAR50=. STAT=. VAR52=. VAR53=. _ERROR_=1 _N_=31730

NOTE: Invalid data for VAR39 in line 31732 63-64.

NOTE: Invalid data for VAR50 in line 31732 76-77.

31732        1    rs2236772   20177372    0         UV      160         NA           NA           NA 90

VAR1=  CHR=  VAR3=  VAR4=1 VAR5=  VAR6=  VAR7=  VAR8=rs2236772 VAR9=  VAR10=  VAR11=2 SNP=. VAR13=. VAR14=  VAR15=0 VAR16=

VAR17=  VAR18=  VAR19=  VAR20=  BP=  VAR22=  VAR23=  A1=UV VAR25=  VAR26=  VAR27=  VAR28=  VAR29=  VAR30=160 TEST=. VAR32=.

VAR33=. VAR34=. NMISS=. VAR36=. VAR37=. VAR38=. VAR39=. VAR40=. VAR41=. BETA=. VAR43=. VAR44=. VAR45=. VAR46=. VAR47=. VAR48=.

VAR49=. VAR50=. STAT=. VAR52=. VAR53=. _ERROR_=1 _N_=31731

NOTE: Invalid data for VAR39 in line 31733 63-64.

NOTE: Invalid data for VAR50 in line 31733 76-77.

WARNING: Limit set by ERRORS= option reached.  Further errors of this type will not be printed.

31733        1    rs2236772   20177372    0         c1      160         NA           NA           NA 90

VAR1=  CHR=  VAR3=  VAR4=1 VAR5=  VAR6=  VAR7=  VAR8=rs2236772 VAR9=  VAR10=  VAR11=2 SNP=. VAR13=. VAR14=  VAR15=0 VAR16=

VAR17=  VAR18=  VAR19=  VAR20=  BP=  VAR22=  VAR23=  A1=c1 VAR25=  VAR26=  VAR27=  VAR28=  VAR29=  VAR30=160 TEST=. VAR32=.

VAR33=. VAR34=. NMISS=. VAR36=. VAR37=. VAR38=. VAR39=. VAR40=. VAR41=. BETA=. VAR43=. VAR44=. VAR45=. VAR46=. VAR47=. VAR48=.

VAR49=. VAR50=. STAT=. VAR52=. VAR53=. _ERROR_=1 _N_=31732

NOTE: 4150092 records were read from the infile 'C:\Users\Adam Bress\Downloads\plink.assoc.linear.txt'.

      The minimum record length was 90.

      The maximum record length was 91.

NOTE: The data set WORK.GWAS has 4150092 observations and 53 variables.

NOTE: DATA statement used (Total process time):

      real time           14.57 seconds

      cpu time            12.74 seconds

Super User
Posts: 10,500

Re: Help with Data input

Here's a clue: Your variable name SNP appears in the informat list and input statement as the 12th variable and BP 9 more after than. I think you may have tabs or something else on the column heading line than the text displayed in your example, possibly a bunch of tabs or null characters.

When I looked at your first example file it is not space delimited.

I would be tempted to copy the first line of data file into a plain next editor and see what it looks like. Another option would be to use the SAS FSLIST tool to look at the file. It will show where things appear by column though null characters may be visible the column positions of your column headings in relation to data better than some other software.

Occasional Contributor
Posts: 8

Re: Help with Data input

Thanks Ballard. I have a attached the dataset to my first post. Im trying to learn how to deal with the strange space delimiter in the data file.

Do you have a recommendation?

Super User
Posts: 10,500

Re: Help with Data input

Try importing it as TAB delimited. I opened the file in Word and it shows TABS separating the values.

Occasional Contributor
Posts: 8

Re: Help with Data input

Thanks Ballard. I tried imported as tab delimited and got this error. Any suggestions to get this to work?

915  proc import datafile="C:\Users\Adam Bress\Downloads\plink.assoc.linear" out=gwas dbms=tab replace;

916           datarow=2;

917          getnames=yes;

918  run;

Name  CHR          SNP         BP   A1       TEST    NMISS       BETA         STAT            P  truncated to

_CHR__________SNP_________BP___A.

Problems were detected with provided names.  See LOG.

919   /**********************************************************************

920   *   PRODUCT:   SAS

921   *   VERSION:   9.2

922   *   CREATOR:   External File Interface

923   *   DATE:      07MAY13

924   *   DESC:      Generated SAS Datastep Code

925   *   TEMPLATE SOURCE:  (None Specified.)

926   ***********************************************************************/

927      data WORK.GWAS                                    ;

928      %let _EFIERR_ = 0; /* set the ERROR detection macro variable */

929      infile 'C:\Users\Adam Bress\Downloads\plink.assoc.linear' delimiter='09'x MISSOVER DSD lrecl=32767 firstobs=2 ;

930         informat _CHR__________SNP_________BP___A $87. ;

931         format _CHR__________SNP_________BP___A $87. ;

932      input

933                  _CHR__________SNP_________BP___A $

934      ;

935      if _ERROR_ then call symputx('_EFIERR_',1);  /* set ERROR detection macro variable */

936      run;

NOTE: The infile 'C:\Users\Adam Bress\Downloads\plink.assoc.linear' is:

      Filename=C:\Users\Adam Bress\Downloads\plink.assoc.linear,

      RECFM=V,LRECL=32767,

      File Size (bytes)=381808598,

      Last Modified=08May2013:10:34:12,

      Create Time=07May2013:11:41:15

NOTE: 4150092 records were read from the infile 'C:\Users\Adam Bress\Downloads\plink.assoc.linear'.

      The minimum record length was 90.

      The maximum record length was 91.

NOTE: The data set WORK.GWAS has 4150092 observations and 1 variables.

NOTE: DATA statement used (Total process time):

      real time           2.92 seconds

      cpu time            2.55 seconds

4150092 rows created in WORK.GWAS                                 from C:\Users\Adam Bress\Downloads\plink.assoc.linear.

Occasional Contributor
Posts: 8

Re: Help with Data input

I Think i might have to do something like this, but im specifying the variable locations wrong.

Data manhattan;                                                                                                                        

INFILE "C:\Users\Adam Bress\Downloads\plink.assoc.linear";                                                                             

INPUT chr 4    snp 8-17 bp 23-28 a1 33  test 42-44 nmiss 51-53 beta 56-66 stat 71-77 p 84-91;                                          

RUN;                                                                                                                                   

Super User
Posts: 10,500

Re: Help with Data input

I didn't have any problem reading the example file with:

 

proc import datafile="D:\plink.assoc.linear" out=gwas dbms=tab replace;

datarow=2;

getnames=yes;

run;

But I get 1047617 rows from the example file where your log shows 4150092

If you try to write your own input you'll need to add the delimiter='09'x at least. Your data isn't fixed columns. Test, Beta, Stat and P all have different lengths/numbers of significant digits.

Which OS are you running? I'm running SAS 9.;2.3 under Win7 and could read with the above syntax. We may be getting some sort of file conversion through the forum though.

Super User
Super User
Posts: 6,500

Re: Help with Data input

The file you posted in the second ZIP file (is there a difference between the two?) is a normal TAB delimited DOS file (CR and LF as the line delimiters) with 9 columns. 

The first line has the variable names.

The next 7 lines have data.

The remaining 1047610 lines are totally empty except for the tabs.

I can read it easily with PROC IMPORT.

172  options generic;

173  filename xx '~/plink.assoc.linear' termstr=CRLF ;

174  data _null_;

175    infile xx obs=9 ;

176    input;

177    list;

178  run;

NOTE: The infile XX is:

      (system-specific pathname),

      (system-specific file attributes)

RULE:     ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9----+----0

1   CHAR  CHR.SNP.BP.A1.TEST.NMISS.BETA.STAT.P 36

    ZONE  445054504504305455044455044540554505

    NUMR  38293E0920911945349ED933925419341490

2   CHAR  1.rs28659788.713170.G.ADD.156.-3.549.-0.8538.0.3946 51

    ZONE  307733333333033333304044403330232333023233330323333

    NUMR  192328659788971317097914491569D3E5499D0E853890E3946

3   CHAR  1.rs28659788.713170.G.Age.156.-0.006926.-0.1199.0.9048 54

    ZONE  307733333333033333304046603330232333333023233330323333

    NUMR  192328659788971317097917591569D0E0069269D0E119990E9048

4   CHAR  1.rs28659788.713170.G.UV.156.-1.725.-1.369.0.1729 49

    ZONE  3077333333330333333040550333023233302323330323333

    NUMR  19232865978897131709795691569D1E7259D1E36990E1729

5   CHAR  1.rs28659788.713170.G.c1.156.10.05.2.434.0.01611 48

    ZONE  307733333333033333304063033303323303233303233333

    NUMR  1923286597889713170979319156910E0592E43490E01611

6   CHAR  1.rs3094315.742429.T.ADD.160.-0.3026.-0.3673.0.7139 51

    ZONE  307733333330333333050444033302323333023233330323333

    NUMR  19233094315974242994914491609D0E30269D0E367390E7139

7   CHAR  1.rs3094315.742429.T.Age.160.-0.02911.-0.5245.0.6007 52

    ZONE  3077333333303333330504660333023233333023233330323333

    NUMR  19233094315974242994917591609D0E029119D0E524590E6007

8   CHAR  1.rs3094315.742429.T.UV.160.-1.579.-1.244.0.2153 48

    ZONE  307733333330333333050550333023233302323330323333

    NUMR  1923309431597424299495691609D1E5799D1E24490E2153

9   CHAR  ........ 8

    ZONE  00000000

    NUMR  99999999

NOTE: 9 records were read from the infile (system-specific pathname).

      The minimum record length was 8.

      The maximum record length was 54.

NOTE: DATA statement used (Total process time):

      real time           0.02 seconds

      cpu time            0.00 seconds

179  options nogeneric;

Ask a Question
Discussion stats
  • 9 replies
  • 676 views
  • 0 likes
  • 3 in conversation