Help using Base SAS procedures

reading file (w/ non-consistent data) into long format

Accepted Solution Solved
Reply
Contributor
Posts: 30
Accepted Solution

reading file (w/ non-consistent data) into long format

Hi,

I am having trouble reading a file containing non-consistent data in wide format into long format.

for example, the data looks like this:

subject, sex, time1, time2, time3,..., intensity1, intensity2, intensity3....

1,m,1,45,5,2,5,3

2,m,3,6,5,5,6,3

3,m,6,65,5,2,20,3,23,63

4,m,9,6,5,43,5,3

5,m,3,195,5,2,5,32,34,78,12,42

6,m,6,92,5,232,5,23

so some subjects may have 3 times and 3 intensities and other may have 4 times/intensities or 5 times/intensities...

If I read the file as

data id sex$ time1-5, intensity1-5;

then transform into long format by

data long; set wide;

  array t(5) time1-time5;

  array i(5) intensity1-intensity5;

  do j=1 to 5;

     time=t(j);

     intensity= i(j);

     output;

  end;


however for those subjects that have <5 times/intensities, SAS will continue to read the next subject's data and the whole table screws up.

is there a way I can read the data directly into long format? something like...

data long;

   infle 'xxx';

   input id sex$ @@;

     - do a loop that reads the remaining variables

     - I can change the variables into time1 intensity1 time2 intensity2 time3 intensity3....

    

     time=

     intensity=

     output;

thanks for any suggestions!

ming


Accepted Solutions
Solution
‎08-20-2014 08:00 AM
Respected Advisor
Posts: 4,013

Re: reading file (w/ non-consistent data) into long format

Assuming that there are no missing values for "time" (=2 consecutive commas) below code should work - else it would need a little tweak. It's a variation of what already posted.

data sample(drop=_Smiley Happy;

  attrib id informat=$10. sex informat=$1.;

  array _t_i (20) 8.;

  call missing(of _all_);

  infile datalines truncover dsd dlm=',';

  input id sex _t_i

  •   _isize=countc(_infile_,',')-1 ;

      do _ind=1 to _isize by 2;

          time=_t_i(_ind);

          intensity=_t_i(_ind+1);

          output;

      end;

      datalines;

    1,m,1,45,5,2,5,3

    2,m,3,6,5,5,6,3

    3,m,6,65,5,2,20,3,23,63

    4,m,9,6,5,43,5,3

    5,m,3,195,5,2,5,32,34,78,12,42

    6,m,6,92,5,232,5,23

    ;

    run;

    Message was edited by: Patrick Matter Stole the "isize" approach from Jaap, so now the code should also work for cases where there is a missing (=2 consecutive commas)

    View solution in original post


    All Replies
    Super Contributor
    Posts: 306

    Re: reading file (w/ non-consistent data) into long format

    Hello,

    The first data step determines the number of the variables (times / intensity) which the data set will contain,

    while the second step imports the values within the data set.

    data _null_;

    input @5 tmins & $30.;
    x=(length(strip(compress(tmins,,'d')))+1)/2;
    if _N_=1 then call symputx('nrvar',x);
    else if x gt input(symget('nrvar'),best12.) then call symputx('nrvar',x);;
    datalines;
    1,m,1,45,5,2,5,3
    2,m,3,6,5,5,6,3
    3,m,6,65,5,2,20,3,23,63
    4,m,9,6,5,43,5,3
    5,m,3,195,5,2,5,32,34,78,12,42
    6,m,6,92,5,232,5,23
    ;
    run;

    data have (drop=rowvar i tmins);
    infile datalines dsd;
    *VARIABLES;
    input subject sex $ tmins & $30. @ ;
    array times{&nrvar};
    array intens{&nrvar};
    *Computation;
    rowvar=length(strip(compress(tmins,,'d')))+1;*number of commas in order to determin the loop;
    do i=1 to rowvar/2;
    times{i}=input(scan(tmins,i,","),best12.);*in order to avoid the note within the log;
    intens{i}=input(scan(tmins,i+rowvar/2,","),best12.);
    end;
    datalines;
    1,m,1,45,5,2,5,3
    2,m,3,6,5,5,6,3
    3,m,6,65,5,2,20,3,23,63
    4,m,9,6,5,43,5,3
    5,m,3,195,5,2,5,32,34,78,12,42
    6,m,6,92,5,232,5,23
    ;
    run;

    Valued Guide
    Posts: 3,208

    Re: reading file (w/ non-consistent data) into long format

    Just one datastep needed. See commentlines

    data long (keep=id sex time intensity) ;
    length sex $ 1 ;
    array timen(5) timen1-timen5;
    array inten(5) intensity1-intensity5;
    infile cards missover DSD DLM=',';   /*just to be sure of options */
    input id sex @ ;
    isize=(countw(_infile_)-2)/2 ;       /* _infile is the current record - size of the array in words*/
    do i=1 to isize ; input timen @; end;
    do i=1 to isize ; input inten @; end;
    put id sex isize ; /* debugging shows processing */

    do i=1 to isize ; /* output long in array size */
       time=timen ;
       intensity=inten ;
       output ;
    end;

    datalines;
    1,m,1,45,5,2,5,3
    2,m,3,6,5,5,6,3
    3,m,6,65,5,2,20,3,23,63
    4,m,9,6,5,43,5,3
    5,m,3,195,5,2,5,32,34,78,12,42
    6,m,6,92,5,232,5,23
    run;

    ---->-- ja karman --<-----
    Solution
    ‎08-20-2014 08:00 AM
    Respected Advisor
    Posts: 4,013

    Re: reading file (w/ non-consistent data) into long format

    Assuming that there are no missing values for "time" (=2 consecutive commas) below code should work - else it would need a little tweak. It's a variation of what already posted.

    data sample(drop=_Smiley Happy;

      attrib id informat=$10. sex informat=$1.;

      array _t_i (20) 8.;

      call missing(of _all_);

      infile datalines truncover dsd dlm=',';

      input id sex _t_i

  •   _isize=countc(_infile_,',')-1 ;

      do _ind=1 to _isize by 2;

          time=_t_i(_ind);

          intensity=_t_i(_ind+1);

          output;

      end;

      datalines;

    1,m,1,45,5,2,5,3

    2,m,3,6,5,5,6,3

    3,m,6,65,5,2,20,3,23,63

    4,m,9,6,5,43,5,3

    5,m,3,195,5,2,5,32,34,78,12,42

    6,m,6,92,5,232,5,23

    ;

    run;

    Message was edited by: Patrick Matter Stole the "isize" approach from Jaap, so now the code should also work for cases where there is a missing (=2 consecutive commas)

    Super User
    Posts: 9,775

    Re: reading file (w/ non-consistent data) into long format

    data sample(drop=n i);
    infile cards;
    input;
    subject=scan(_infile_,1,',','m');
    sex=scan(_infile_,2,',','m');
    n=divide(countc(_infile_,',')-1,2);
    do i=3 to n+2;
     time=scan(_infile_,i,',','m');
     intensity=scan(_infile_,i+n,',','m');
     output;
    end;
    datalines;
    1,m,1,45,5,2,5,3
    2,m,3,6,5,5,6,3
    3,m,6,65,5,2,20,3,23,63
    4,m,9,6,5,43,5,3
    5,m,3,195,5,2,5,32,34,78,12,42
    6,m,6,92,5,232,5,23
    ;
    run;
    
    

    Xia Keshan

    🔒 This topic is solved and locked.

    Need further help from the community? Please ask a new question.

    Discussion stats
    • 4 replies
    • 238 views
    • 6 likes
    • 5 in conversation