Programming the statistical procedures from SAS

Splitting a row into multiple rows based on the data below?

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 14
Accepted Solution

Splitting a row into multiple rows based on the data below?

Hello,

Please am new here. I am having a problem with my data set. I have been able to arrange it in a row format given that I have multiple failure time.

current data.

patid    start_time     f ailed_time     time1      time2            time3        ...         time20

1          1/20/1981     1/20/2009     3/4/1999    4/8/2000     8/18/2003   ...         .

2           3/15/2000     5/1/2012      2/3/2002    3/9/2008        .            ....         .

NB. All id do not have time for all time points. Just that when a time point for an id ends it ends all other subsequent time points for such id has . till time20.

The result I am expecting is: single row into multiple  follo

patid date0             date1

1      1/20/1981       3/4/1999

1       3/4/1999       4/8/2000

1      4/8/2000       8/18/2003

1     8/18/2003      1/20/2009

2   3/15/2000        2/3/2002

2   2/3/2002         3/9/2008

2   3/9/2008        5/1/2012.

ie

patid     date0            date1

1        start_time         time1

1          time1              time2

  1         time2              time3

1           time3             failed_time

2          start_time       time1

2          time1              time2

2          time2             failed_time

Thanks in anticipation of a swift response.


Accepted Solutions
Solution
‎08-05-2012 12:49 PM
Esteemed Advisor
Posts: 7,052

Re: Splitting a row into multiple rows based on the data below?

I'm confused.  The variables shown are not the same ones shown in your proc contents' result.

One possibility, other than the above discrepancy, is that the file was imported as mixed variable type.  If some of the start_dates are actually dates shown as numbers but in a character field, the following might work:

data have;

  infile cards truncover;

  input pat_id start_time $14. (fail_time time1-time13 time14__e) (:mmddyy10.);

  format fail_time time1--time14__e mmddyy10.;

  cards;

1  7690           1/20/2009     3/4/1999  4/8/2000  8/18/2003

2  14684          5/1/2012      2/3/2002  3/9/2008

3  1/20/1981      1/20/2009     3/4/1999  4/8/2000  8/18/2003

4  3/15/2000      5/1/2012      2/3/2002  3/9/2008

;

data want (keep=pat_id date0 date1);

  set have;

  if input(left(start_time), ?? mmddyy10.) gt 0 then

   start_t=input(left(start_time),mmddyy10.);

  else start_t=input(start_time,5.);

  format date0 date1 mmddyy10.;

  array t start_t time1--time14__e;

      do i=1 to dim(t) while (not missing(t(i)));

          date0=t(i);

          date1=ifn(missing(t(i+1)),fail_time,t(i+1));

          output;

       end;

run;

Otherwise, we would really have to see a couple of the actual records.

View solution in original post


All Replies
Respected Advisor
Posts: 3,124

Re: Splitting a row into multiple rows based on the data below?

Try this one.

data have;

infile cards truncover;

input id (start_time fail_time time1-time20) (:mmddyy10.);

format start_time fail_time time1-time20 mmddyy10.;

cards;

1          1/20/1981     1/20/2009     3/4/1999    4/8/2000     8/18/2003 

2           3/15/2000     5/1/2012      2/3/2002    3/9/2008

;

data want (keep=id date0 date1);

set have;

format date0 date1 mmddyy10.;

array t start_time time1-time20;

      do i=1 to dim(t) while (not missing(t(i)));

          date0=t(i);

          date1=ifn(missing(t(i+1)),fail_time,t(i+1));

          output;

       end;

run;

proc print;run;

Haikuo

Occasional Contributor
Posts: 14

Re: Splitting a row into multiple rows based on the data below?

Hello Haiuko/Linli,

I guess the code was okay. But at my end I got an error message the following message;

1124    array _time(*) start_time time: fail_time;

ERROR: All variables in array list must be the same type, i.e., all numeric or character.

ERROR: All variables in array list must be the same type, i.e., all numeric or character.

ERROR: All variables in array list must be the same type, i.e., all numeric or character.

ERROR: All variables in array list must be the same type, i.e., all numeric or character.

ERROR: All variables in array list must be the same type, i.e., all numeric or character.

ERROR: All variables in array list must be the same type, i.e., all numeric or character.

ERROR: All variables in array list must be the same type, i.e., all numeric or character.

ERROR: All variables in array list must be the same type, i.e., all numeric or character.

ERROR: All variables in array list must be the same type, i.e., all numeric or character.

ERROR: All variables in array list must be the same type, i.e., all numeric or character.

ERROR: All variables in array list must be the same type, i.e., all numeric or character.

ERROR: All variables in array list must be the same type, i.e., all numeric or character.

ERROR: All variables in array list must be the same type, i.e., all numeric or character.

ERROR: All variables in array list must be the same type, i.e., all numeric or character.

ERROR: All variables in array list must be the same type, i.e., all numeric or character.

1125    do i=1 to dim(_time);

1126    date0=_time(i);

1127    if date0 ne . then output;

1128    end;

1129    format date0 mmddyy10.;

                     ---------

                     48

ERROR 48-59: The format $MMDDYY was not found or could not be loaded.

1130    run;

I guess the problem is with the start_time. I merged several files to obtain the data am supposed to "arrange using the code you gave me. I noticed that the start_time is in number format. I tried formatting it using mmddyy10 but it does seem to work. I have to export the data file to excel where I converted it to number. Because it appear to be in a format I can't tell (it appears in excel with the little green mark in front), i converted it in numeric format and then changed it into the date format 23/03/1981. Re-imported it into sas but still appear in the numeric form. I then ran the code you sent but got the above error message. Sas is still new to me, am just trying to get my way round it.

While am at it trying to find out what the problem is, should you have an idea of what may have gone wrong please I would appreciate.Meanwhile, I didn't use the input file part, because my data was store in excel file so I just imported the whole file.

Esteemed Advisor
Posts: 7,052

Re: Splitting a row into multiple rows based on the data below?

run the following on your dataset and post the results back here:

proc contents data=the_name_of_your_SAS_dataset;

run;

proc print data=the_name_of_your_SAS_dataset (obs=5);

run;

Occasional Contributor
Posts: 14

Re: Splitting a row into multiple rows based on the data below?

Hello Arthur,

I don't have the authorization to share that. The sample I posted above was a sample I carved out from what my data set looks like. If you could help with the above sample that would be great.

Thanks.

Respected Advisor
Posts: 3,124

Re: Splitting a row into multiple rows based on the data below?

Since we kinda know how you data look like, now you need at least to tell us the type of those variables involved, character or numeric? You don't need to post every thing that proc content spit out, just info on type should do.

Haikuo

Respected Advisor
Posts: 4,606

Re: Splitting a row into multiple rows based on the data below?

... then only give the column names and type with the result of :

proc sql;

select name, type from dictionary.columns

where upcase(memname)="YOUR_DATASET_NAME_UPPERCASE" and libname="WORK";

quit;

PG

PG
Occasional Contributor
Posts: 14

Re: Splitting a row into multiple rows based on the data below?

Hello PG/Haiuto,

please is the info requested.

Alphabetic List of Variables and Attributes

               #    Variable      Type    Len    Format       Informat    Label

               3    Fail_time     Num       8    MMDDYY10.                Fail_time

               1    Pat_ID        Num       8    BEST12.                  Pat_ID

               2    start_time    Char     14    $14.         $14.        start_time

               4    time1         Num       8    MMDDYY10.                time1

               5    time2         Num       8    MMDDYY10.                time2

               6    time3         Num       8    MMDDYY10.                time3

               7    time4         Num       8    MMDDYY10.                time4

               8    time5         Num       8    MMDDYY10.                time5

               9    time6         Num       8    MMDDYY10.                time6

              10    time7         Num       8    MMDDYY10.                time7

              11    time8         Num       8    MMDDYY10.                time8

              12    time9         Num       8    MMDDYY10.                time9

              13    time10        Num       8    MMDDYY10.                time10

              14    time11        Num       8    MMDDYY10.                time11

              15    time12        Num       8    MMDDYY10.                time12

              16    time13        Num       8    MMDDYY10.                time13

              17    time14__e     Num       8    MMDDYY10.                time14

Respected Advisor
Posts: 3,124

Re: Splitting a row into multiple rows based on the data below?

Ok, it seems that 'start_time' is the culprit. You need somehow convert it to numeric. Here is an approach based on my first response, please note that 'start_time' is $14., then it is converted to 'start_t' as mmddyy10., the latter goes into array().

data have;

infile cards truncover;

input id start_time :$14. (fail_time time1-time20) (:mmddyy10.);

format fail_time time1-time20 mmddyy10.;

cards;

1          1/20/1981     1/20/2009     3/4/1999    4/8/2000     8/18/2003

2           3/15/2000     5/1/2012      2/3/2002    3/9/2008

;

data want (keep=id date0 date1);

set have;

start_t=input(left(start_time),mmddyy10.);

format date0 date1 mmddyy10.;

array t start_t time1-time20;

      do i=1 to dim(t) while (not missing(t(i)));

          date0=t(i);

          date1=ifn(missing(t(i+1)),fail_time,t(i+1));

          output;

       end;

run;

proc print;run;

BTW, you don't need to refer my name in your reply, while if you do, please refer it correctly, thanks.

Haikuo

Esteemed Advisor
Posts: 7,052

Re: Splitting a row into multiple rows based on the data below?

This is just a restatement of Haikuo's suggested code, but with variable names that match those shown on the proc contents results that you posted:

data have;

  infile cards truncover;

  input pat_id start_time $14. (fail_time time1-time13 time14__e) (:mmddyy10.);

  format fail_time time1--time14__e mmddyy10.;

  cards;

1  1/20/1981      1/20/2009     3/4/1999  4/8/2000  8/18/2003

2  3/15/2000      5/1/2012      2/3/2002  3/9/2008

;

data want (keep=pat_id date0 date1);

  set have;

  start_t=input(left(start_time),mmddyy10.);

  format date0 date1 mmddyy10.;

  array t start_t time1--time14__e;

      do i=1 to dim(t) while (not missing(t(i)));

          date0=t(i);

          date1=ifn(missing(t(i+1)),fail_time,t(i+1));

          output;

       end;

run;

Occasional Contributor
Posts: 14

Re: Splitting a row into multiple rows based on the data below?

It ran but it dropped the first 53 observations. with the following error message for all the observations dropped

NOTE: Invalid argument to function INPUT at line 1857 column 9.

PatID=1 start_time=28764 Fail_time=06/06/2009 time1=. time2=. time3=. time4=. time5=. time6=.

time7=. time8=. time9=. time10=. time11=. time12=. time13=. time14=. start_t=. date0=. date1=.

_I_=. time15=. time16=. time17=. time18=. time19=. time20=. i=1 _ERROR_=1 _N_=1.

please how do i fix this?

Solution
‎08-05-2012 12:49 PM
Esteemed Advisor
Posts: 7,052

Re: Splitting a row into multiple rows based on the data below?

I'm confused.  The variables shown are not the same ones shown in your proc contents' result.

One possibility, other than the above discrepancy, is that the file was imported as mixed variable type.  If some of the start_dates are actually dates shown as numbers but in a character field, the following might work:

data have;

  infile cards truncover;

  input pat_id start_time $14. (fail_time time1-time13 time14__e) (:mmddyy10.);

  format fail_time time1--time14__e mmddyy10.;

  cards;

1  7690           1/20/2009     3/4/1999  4/8/2000  8/18/2003

2  14684          5/1/2012      2/3/2002  3/9/2008

3  1/20/1981      1/20/2009     3/4/1999  4/8/2000  8/18/2003

4  3/15/2000      5/1/2012      2/3/2002  3/9/2008

;

data want (keep=pat_id date0 date1);

  set have;

  if input(left(start_time), ?? mmddyy10.) gt 0 then

   start_t=input(left(start_time),mmddyy10.);

  else start_t=input(start_time,5.);

  format date0 date1 mmddyy10.;

  array t start_t time1--time14__e;

      do i=1 to dim(t) while (not missing(t(i)));

          date0=t(i);

          date1=ifn(missing(t(i+1)),fail_time,t(i+1));

          output;

       end;

run;

Otherwise, we would really have to see a couple of the actual records.

Occasional Contributor
Posts: 14

Re: Splitting a row into multiple rows based on the data below?

Hello Arthur,

It works but I noticed that a date that was supposed to be 1978 was output as 2038 ie 60 years were added to the "start_time date". Don't know why. If it is going to be too much trouble then I will have to run with it. The worse case scenario is I will have to correct the date manually. Though I have more than 60,000 patid.

Thanks a lot.

Esteemed Advisor
Posts: 7,052

Re: Splitting a row into multiple rows based on the data below?

That might confirm my original thought that the data had been imported from Excel.

You can get the correct result by running the code as:

data want (keep=pat_id date0 date1);

  set have;

  if input(left(start_time), ?? mmddyy10.) gt 0 then

   start_t=input(left(start_time),mmddyy10.);

  else start_t=input(start_time,5.)-21916;

  format date0 date1 mmddyy10.;

  array t start_t time1--time14__e;

      do i=1 to dim(t) while (not missing(t(i)));

          date0=t(i);

          date1=ifn(missing(t(i+1)),fail_time,t(i+1));

          output;

       end;

run;

The correction, of subtracting 21916 from the numeric dates, is explained in the following paper:

http://www2.sas.com/proceedings/sugi29/068-29.pdf

Occasional Contributor
Posts: 14

Re: Splitting a row into multiple rows based on the data below?

Hello Arthur,

that's perfect. I am really grateful for you help and time. also my thanks goes to Haikuo (pardon me I have to mention your name again) , Linlin and PG for your assistance as well. Thanks all and do have a great evening.

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 25 replies
  • 2708 views
  • 6 likes
  • 7 in conversation