Solved: Re: reading file (w/ non-consistent data) into long format

Ming · Posted 08-19-2014 06:57 PM

Hi,

I am having trouble reading a file containing non-consistent data in wide format into long format.

for example, the data looks like this:

subject, sex, time1, time2, time3,..., intensity1, intensity2, intensity3....

1,m,1,45,5,2,5,3

2,m,3,6,5,5,6,3

3,m,6,65,5,2,20,3,23,63

4,m,9,6,5,43,5,3

5,m,3,195,5,2,5,32,34,78,12,42

6,m,6,92,5,232,5,23

so some subjects may have 3 times and 3 intensities and other may have 4 times/intensities or 5 times/intensities...

If I read the file as

data id sex$ time1-5, intensity1-5;

then transform into long format by

data long; set wide;

array t(5) time1-time5;

array i(5) intensity1-intensity5;

do j=1 to 5;

time=t(j);

intensity= i(j);

output;

end;

however for those subjects that have <5 times/intensities, SAS will continue to read the next subject's data and the whole table screws up.

is there a way I can read the data directly into long format? something like...

data long;

infle 'xxx';

input id sex$ @@;

- do a loop that reads the remaining variables

- I can change the variables into time1 intensity1 time2 intensity2 time3 intensity3....

time=

intensity=

output;

thanks for any suggestions!

ming

Patrick · Posted 08-20-2014 08:00 AM

Assuming that there are no missing values for "time" (=2 consecutive commas) below code should work - else it would need a little tweak. It's a variation of what already posted.

data sample(drop=_:);

attrib id informat=$10. sex informat=$1.;

array _t_i (20) 8.;

call missing(of _all_);

infile datalines truncover dsd dlm=',';

input id sex _t_i

;

_isize=countc(_infile_,',')-1 ;

do _ind=1 to _isize by 2;

time=_t_i(_ind);

intensity=_t_i(_ind+1);

output;

end;

datalines;

1,m,1,45,5,2,5,3

2,m,3,6,5,5,6,3

3,m,6,65,5,2,20,3,23,63

4,m,9,6,5,43,5,3

5,m,3,195,5,2,5,32,34,78,12,42

6,m,6,92,5,232,5,23

;

run;

Message was edited by: Patrick Matter Stole the "isize" approach from Jaap, so now the code should also work for cases where there is a missing (=2 consecutive commas)

View solution in original post

Loko · Posted 08-20-2014 04:45 AM

Hello,

The first data step determines the number of the variables (times / intensity) which the data set will contain,

while the second step imports the values within the data set.

data _null_;

input @5 tmins & $30.;
x=(length(strip(compress(tmins,,'d')))+1)/2;
if _N_=1 then call symputx('nrvar',x);
else if x gt input(symget('nrvar'),best12.) then call symputx('nrvar',x);;
datalines;
1,m,1,45,5,2,5,3
2,m,3,6,5,5,6,3
3,m,6,65,5,2,20,3,23,63
4,m,9,6,5,43,5,3
5,m,3,195,5,2,5,32,34,78,12,42
6,m,6,92,5,232,5,23
;
run;

data have (drop=rowvar i tmins);
infile datalines dsd;
*VARIABLES;
input subject sex $ tmins & $30. @ ;
array times{&nrvar};
array intens{&nrvar};
*Computation;
rowvar=length(strip(compress(tmins,,'d')))+1;*number of commas in order to determin the loop;
do i=1 to rowvar/2;
times{i}=input(scan(tmins,i,","),best12.);*in order to avoid the note within the log;
intens{i}=input(scan(tmins,i+rowvar/2,","),best12.);
end;
datalines;
1,m,1,45,5,2,5,3
2,m,3,6,5,5,6,3
3,m,6,65,5,2,20,3,23,63
4,m,9,6,5,43,5,3
5,m,3,195,5,2,5,32,34,78,12,42
6,m,6,92,5,232,5,23
;
run;

jakarman · Posted 08-20-2014 06:13 AM

Just one datastep needed. See commentlines

data long (keep=id sex time intensity) ;
length sex $ 1 ;
array timen(5) timen1-timen5;
array inten(5) intensity1-intensity5;
infile cards missover DSD DLM=','; /*just to be sure of options */
input id sex @ ;
isize=(countw(_infile_)-2)/2 ; /* _infile is the current record - size of the array in words*/
do i=1 to isize ; input timen @; end;
do i=1 to isize ; input inten @; end;
put id sex isize ; /* debugging shows processing */

do i=1 to isize ; /* output long in array size */
   time=timen ;
   intensity=inten ;
   output ;
end;

datalines;
1,m,1,45,5,2,5,3
2,m,3,6,5,5,6,3
3,m,6,65,5,2,20,3,23,63
4,m,9,6,5,43,5,3
5,m,3,195,5,2,5,32,34,78,12,42
6,m,6,92,5,232,5,23
run;

---->-- ja karman --<-----

Patrick · Posted 08-20-2014 08:00 AM

Assuming that there are no missing values for "time" (=2 consecutive commas) below code should work - else it would need a little tweak. It's a variation of what already posted.

data sample(drop=_:);

attrib id informat=$10. sex informat=$1.;

array _t_i (20) 8.;

call missing(of _all_);

infile datalines truncover dsd dlm=',';

input id sex _t_i

;

_isize=countc(_infile_,',')-1 ;

do _ind=1 to _isize by 2;

time=_t_i(_ind);

intensity=_t_i(_ind+1);

output;

end;

datalines;

1,m,1,45,5,2,5,3

2,m,3,6,5,5,6,3

3,m,6,65,5,2,20,3,23,63

4,m,9,6,5,43,5,3

5,m,3,195,5,2,5,32,34,78,12,42

6,m,6,92,5,232,5,23

;

run;

Message was edited by: Patrick Matter Stole the "isize" approach from Jaap, so now the code should also work for cases where there is a missing (=2 consecutive commas)

Ksharp · Posted 08-20-2014 10:16 AM

data sample(drop=n i);
infile cards;
input;
subject=scan(_infile_,1,',','m');
sex=scan(_infile_,2,',','m');
n=divide(countc(_infile_,',')-1,2);
do i=3 to n+2;
 time=scan(_infile_,i,',','m');
 intensity=scan(_infile_,i+n,',','m');
 output;
end;
datalines;
1,m,1,45,5,2,5,3
2,m,3,6,5,5,6,3
3,m,6,65,5,2,20,3,23,63
4,m,9,6,5,43,5,3
5,m,3,195,5,2,5,32,34,78,12,42
6,m,6,92,5,232,5,23
;
run;

Xia Keshan

reading file (w/ non-consistent data) into long format

Re: reading file (w/ non-consistent data) into long format

Re: reading file (w/ non-consistent data) into long format

Re: reading file (w/ non-consistent data) into long format

Re: reading file (w/ non-consistent data) into long format

Re: reading file (w/ non-consistent data) into long format

SAS Innovate 2025: Call for Content

Click image to register for webinar

Classroom Training Available!