BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Ming
Calcite | Level 5

Hi,

I am having trouble reading a file containing non-consistent data in wide format into long format.

for example, the data looks like this:

subject, sex, time1, time2, time3,..., intensity1, intensity2, intensity3....

1,m,1,45,5,2,5,3

2,m,3,6,5,5,6,3

3,m,6,65,5,2,20,3,23,63

4,m,9,6,5,43,5,3

5,m,3,195,5,2,5,32,34,78,12,42

6,m,6,92,5,232,5,23

so some subjects may have 3 times and 3 intensities and other may have 4 times/intensities or 5 times/intensities...

If I read the file as

data id sex$ time1-5, intensity1-5;

then transform into long format by

data long; set wide;

  array t(5) time1-time5;

  array i(5) intensity1-intensity5;

  do j=1 to 5;

     time=t(j);

     intensity= i(j);

     output;

  end;


however for those subjects that have <5 times/intensities, SAS will continue to read the next subject's data and the whole table screws up.

is there a way I can read the data directly into long format? something like...

data long;

   infle 'xxx';

   input id sex$ @@;

     - do a loop that reads the remaining variables

     - I can change the variables into time1 intensity1 time2 intensity2 time3 intensity3....

    

     time=

     intensity=

     output;

thanks for any suggestions!

ming

1 ACCEPTED SOLUTION

Accepted Solutions
Patrick
Opal | Level 21

Assuming that there are no missing values for "time" (=2 consecutive commas) below code should work - else it would need a little tweak. It's a variation of what already posted.

data sample(drop=_:);

  attrib id informat=$10. sex informat=$1.;

  array _t_i (20) 8.;

  call missing(of _all_);

  infile datalines truncover dsd dlm=',';

  input id sex _t_i

  •   _isize=countc(_infile_,',')-1 ;

      do _ind=1 to _isize by 2;

          time=_t_i(_ind);

          intensity=_t_i(_ind+1);

          output;

      end;

      datalines;

    1,m,1,45,5,2,5,3

    2,m,3,6,5,5,6,3

    3,m,6,65,5,2,20,3,23,63

    4,m,9,6,5,43,5,3

    5,m,3,195,5,2,5,32,34,78,12,42

    6,m,6,92,5,232,5,23

    ;

    run;

    Message was edited by: Patrick Matter Stole the "isize" approach from Jaap, so now the code should also work for cases where there is a missing (=2 consecutive commas)

    View solution in original post

    4 REPLIES 4
    Loko
    Barite | Level 11

    Hello,

    The first data step determines the number of the variables (times / intensity) which the data set will contain,

    while the second step imports the values within the data set.

    data _null_;

    input @5 tmins & $30.;
    x=(length(strip(compress(tmins,,'d')))+1)/2;
    if _N_=1 then call symputx('nrvar',x);
    else if x gt input(symget('nrvar'),best12.) then call symputx('nrvar',x);;
    datalines;
    1,m,1,45,5,2,5,3
    2,m,3,6,5,5,6,3
    3,m,6,65,5,2,20,3,23,63
    4,m,9,6,5,43,5,3
    5,m,3,195,5,2,5,32,34,78,12,42
    6,m,6,92,5,232,5,23
    ;
    run;

    data have (drop=rowvar i tmins);
    infile datalines dsd;
    *VARIABLES;
    input subject sex $ tmins & $30. @ ;
    array times{&nrvar};
    array intens{&nrvar};
    *Computation;
    rowvar=length(strip(compress(tmins,,'d')))+1;*number of commas in order to determin the loop;
    do i=1 to rowvar/2;
    times{i}=input(scan(tmins,i,","),best12.);*in order to avoid the note within the log;
    intens{i}=input(scan(tmins,i+rowvar/2,","),best12.);
    end;
    datalines;
    1,m,1,45,5,2,5,3
    2,m,3,6,5,5,6,3
    3,m,6,65,5,2,20,3,23,63
    4,m,9,6,5,43,5,3
    5,m,3,195,5,2,5,32,34,78,12,42
    6,m,6,92,5,232,5,23
    ;
    run;

    jakarman
    Barite | Level 11

    Just one datastep needed. See commentlines

    data long (keep=id sex time intensity) ;
    length sex $ 1 ;
    array timen(5) timen1-timen5;
    array inten(5) intensity1-intensity5;
    infile cards missover DSD DLM=',';   /*just to be sure of options */
    input id sex @ ;
    isize=(countw(_infile_)-2)/2 ;       /* _infile is the current record - size of the array in words*/
    do i=1 to isize ; input timen @; end;
    do i=1 to isize ; input inten @; end;
    put id sex isize ; /* debugging shows processing */

    do i=1 to isize ; /* output long in array size */
       time=timen ;
       intensity=inten ;
       output ;
    end;

    datalines;
    1,m,1,45,5,2,5,3
    2,m,3,6,5,5,6,3
    3,m,6,65,5,2,20,3,23,63
    4,m,9,6,5,43,5,3
    5,m,3,195,5,2,5,32,34,78,12,42
    6,m,6,92,5,232,5,23
    run;

    ---->-- ja karman --<-----
    Patrick
    Opal | Level 21

    Assuming that there are no missing values for "time" (=2 consecutive commas) below code should work - else it would need a little tweak. It's a variation of what already posted.

    data sample(drop=_:);

      attrib id informat=$10. sex informat=$1.;

      array _t_i (20) 8.;

      call missing(of _all_);

      infile datalines truncover dsd dlm=',';

      input id sex _t_i

  •   _isize=countc(_infile_,',')-1 ;

      do _ind=1 to _isize by 2;

          time=_t_i(_ind);

          intensity=_t_i(_ind+1);

          output;

      end;

      datalines;

    1,m,1,45,5,2,5,3

    2,m,3,6,5,5,6,3

    3,m,6,65,5,2,20,3,23,63

    4,m,9,6,5,43,5,3

    5,m,3,195,5,2,5,32,34,78,12,42

    6,m,6,92,5,232,5,23

    ;

    run;

    Message was edited by: Patrick Matter Stole the "isize" approach from Jaap, so now the code should also work for cases where there is a missing (=2 consecutive commas)

    Ksharp
    Super User
    data sample(drop=n i);
    infile cards;
    input;
    subject=scan(_infile_,1,',','m');
    sex=scan(_infile_,2,',','m');
    n=divide(countc(_infile_,',')-1,2);
    do i=3 to n+2;
     time=scan(_infile_,i,',','m');
     intensity=scan(_infile_,i+n,',','m');
     output;
    end;
    datalines;
    1,m,1,45,5,2,5,3
    2,m,3,6,5,5,6,3
    3,m,6,65,5,2,20,3,23,63
    4,m,9,6,5,43,5,3
    5,m,3,195,5,2,5,32,34,78,12,42
    6,m,6,92,5,232,5,23
    ;
    run;
    
    

    Xia Keshan

    sas-innovate-2024.png

    Today is the last day to save with the early bird rate! Register today for just $695 - $100 off the standard rate.

     

    Plus, pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

     

    Register now!

    What is Bayesian Analysis?

    Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

    Find more tutorials on the SAS Users YouTube channel.

    Click image to register for webinarClick image to register for webinar

    Classroom Training Available!

    Select SAS Training centers are offering in-person courses. View upcoming courses for:

    View all other training opportunities.

    Discussion stats
    • 4 replies
    • 833 views
    • 6 likes
    • 5 in conversation