Hello,
Please am new here. I am having a problem with my data set. I have been able to arrange it in a row format given that I have multiple failure time.
current data.
patid start_time f ailed_time time1 time2 time3 ... time20
1 1/20/1981 1/20/2009 3/4/1999 4/8/2000 8/18/2003 ... .
2 3/15/2000 5/1/2012 2/3/2002 3/9/2008 . .... .
NB. All id do not have time for all time points. Just that when a time point for an id ends it ends all other subsequent time points for such id has . till time20.
The result I am expecting is: single row into multiple follo
patid date0 date1
1 1/20/1981 3/4/1999
1 3/4/1999 4/8/2000
1 4/8/2000 8/18/2003
1 8/18/2003 1/20/2009
2 3/15/2000 2/3/2002
2 2/3/2002 3/9/2008
2 3/9/2008 5/1/2012.
ie
patid date0 date1
1 start_time time1
1 time1 time2
1 time2 time3
1 time3 failed_time
2 start_time time1
2 time1 time2
2 time2 failed_time
Thanks in anticipation of a swift response.
I'm confused. The variables shown are not the same ones shown in your proc contents' result.
One possibility, other than the above discrepancy, is that the file was imported as mixed variable type. If some of the start_dates are actually dates shown as numbers but in a character field, the following might work:
data have;
infile cards truncover;
input pat_id start_time $14. (fail_time time1-time13 time14__e) (:mmddyy10.);
format fail_time time1--time14__e mmddyy10.;
cards;
1 7690 1/20/2009 3/4/1999 4/8/2000 8/18/2003
2 14684 5/1/2012 2/3/2002 3/9/2008
3 1/20/1981 1/20/2009 3/4/1999 4/8/2000 8/18/2003
4 3/15/2000 5/1/2012 2/3/2002 3/9/2008
;
data want (keep=pat_id date0 date1);
set have;
if input(left(start_time), ?? mmddyy10.) gt 0 then
start_t=input(left(start_time),mmddyy10.);
else start_t=input(start_time,5.);
format date0 date1 mmddyy10.;
array t start_t time1--time14__e;
do i=1 to dim(t) while (not missing(t(i)));
date0=t(i);
date1=ifn(missing(t(i+1)),fail_time,t(i+1));
output;
end;
run;
Otherwise, we would really have to see a couple of the actual records.
Try this one.
data have;
infile cards truncover;
input id (start_time fail_time time1-time20) (:mmddyy10.);
format start_time fail_time time1-time20 mmddyy10.;
cards;
1 1/20/1981 1/20/2009 3/4/1999 4/8/2000 8/18/2003
2 3/15/2000 5/1/2012 2/3/2002 3/9/2008
;
data want (keep=id date0 date1);
set have;
format date0 date1 mmddyy10.;
array t start_time time1-time20;
do i=1 to dim(t) while (not missing(t(i)));
date0=t(i);
date1=ifn(missing(t(i+1)),fail_time,t(i+1));
output;
end;
run;
proc print;run;
Haikuo
Hello Haiuko/Linli,
I guess the code was okay. But at my end I got an error message the following message;
1124 array _time(*) start_time time: fail_time;
ERROR: All variables in array list must be the same type, i.e., all numeric or character.
ERROR: All variables in array list must be the same type, i.e., all numeric or character.
ERROR: All variables in array list must be the same type, i.e., all numeric or character.
ERROR: All variables in array list must be the same type, i.e., all numeric or character.
ERROR: All variables in array list must be the same type, i.e., all numeric or character.
ERROR: All variables in array list must be the same type, i.e., all numeric or character.
ERROR: All variables in array list must be the same type, i.e., all numeric or character.
ERROR: All variables in array list must be the same type, i.e., all numeric or character.
ERROR: All variables in array list must be the same type, i.e., all numeric or character.
ERROR: All variables in array list must be the same type, i.e., all numeric or character.
ERROR: All variables in array list must be the same type, i.e., all numeric or character.
ERROR: All variables in array list must be the same type, i.e., all numeric or character.
ERROR: All variables in array list must be the same type, i.e., all numeric or character.
ERROR: All variables in array list must be the same type, i.e., all numeric or character.
ERROR: All variables in array list must be the same type, i.e., all numeric or character.
1125 do i=1 to dim(_time);
1126 date0=_time(i);
1127 if date0 ne . then output;
1128 end;
1129 format date0 mmddyy10.;
---------
48
ERROR 48-59: The format $MMDDYY was not found or could not be loaded.
1130 run;
I guess the problem is with the start_time. I merged several files to obtain the data am supposed to "arrange using the code you gave me. I noticed that the start_time is in number format. I tried formatting it using mmddyy10 but it does seem to work. I have to export the data file to excel where I converted it to number. Because it appear to be in a format I can't tell (it appears in excel with the little green mark in front), i converted it in numeric format and then changed it into the date format 23/03/1981. Re-imported it into sas but still appear in the numeric form. I then ran the code you sent but got the above error message. Sas is still new to me, am just trying to get my way round it.
While am at it trying to find out what the problem is, should you have an idea of what may have gone wrong please I would appreciate.Meanwhile, I didn't use the input file part, because my data was store in excel file so I just imported the whole file.
run the following on your dataset and post the results back here:
proc contents data=the_name_of_your_SAS_dataset;
run;
proc print data=the_name_of_your_SAS_dataset (obs=5);
run;
Hello Arthur,
I don't have the authorization to share that. The sample I posted above was a sample I carved out from what my data set looks like. If you could help with the above sample that would be great.
Thanks.
Since we kinda know how you data look like, now you need at least to tell us the type of those variables involved, character or numeric? You don't need to post every thing that proc content spit out, just info on type should do.
Haikuo
... then only give the column names and type with the result of :
proc sql;
select name, type from dictionary.columns
where upcase(memname)="YOUR_DATASET_NAME_UPPERCASE" and libname="WORK";
quit;
PG
Hello PG/Haiuto,
please is the info requested.
Alphabetic List of Variables and Attributes
# Variable Type Len Format Informat Label
3 Fail_time Num 8 MMDDYY10. Fail_time
1 Pat_ID Num 8 BEST12. Pat_ID
2 start_time Char 14 $14. $14. start_time
4 time1 Num 8 MMDDYY10. time1
5 time2 Num 8 MMDDYY10. time2
6 time3 Num 8 MMDDYY10. time3
7 time4 Num 8 MMDDYY10. time4
8 time5 Num 8 MMDDYY10. time5
9 time6 Num 8 MMDDYY10. time6
10 time7 Num 8 MMDDYY10. time7
11 time8 Num 8 MMDDYY10. time8
12 time9 Num 8 MMDDYY10. time9
13 time10 Num 8 MMDDYY10. time10
14 time11 Num 8 MMDDYY10. time11
15 time12 Num 8 MMDDYY10. time12
16 time13 Num 8 MMDDYY10. time13
17 time14__e Num 8 MMDDYY10. time14
Ok, it seems that 'start_time' is the culprit. You need somehow convert it to numeric. Here is an approach based on my first response, please note that 'start_time' is $14., then it is converted to 'start_t' as mmddyy10., the latter goes into array().
data have;
infile cards truncover;
input id start_time :$14. (fail_time time1-time20) (:mmddyy10.);
format fail_time time1-time20 mmddyy10.;
cards;
1 1/20/1981 1/20/2009 3/4/1999 4/8/2000 8/18/2003
2 3/15/2000 5/1/2012 2/3/2002 3/9/2008
;
data want (keep=id date0 date1);
set have;
start_t=input(left(start_time),mmddyy10.);
format date0 date1 mmddyy10.;
array t start_t time1-time20;
do i=1 to dim(t) while (not missing(t(i)));
date0=t(i);
date1=ifn(missing(t(i+1)),fail_time,t(i+1));
output;
end;
run;
proc print;run;
BTW, you don't need to refer my name in your reply, while if you do, please refer it correctly, thanks.
Haikuo
This is just a restatement of Haikuo's suggested code, but with variable names that match those shown on the proc contents results that you posted:
data have;
infile cards truncover;
input pat_id start_time $14. (fail_time time1-time13 time14__e) (:mmddyy10.);
format fail_time time1--time14__e mmddyy10.;
cards;
1 1/20/1981 1/20/2009 3/4/1999 4/8/2000 8/18/2003
2 3/15/2000 5/1/2012 2/3/2002 3/9/2008
;
data want (keep=pat_id date0 date1);
set have;
start_t=input(left(start_time),mmddyy10.);
format date0 date1 mmddyy10.;
array t start_t time1--time14__e;
do i=1 to dim(t) while (not missing(t(i)));
date0=t(i);
date1=ifn(missing(t(i+1)),fail_time,t(i+1));
output;
end;
run;
It ran but it dropped the first 53 observations. with the following error message for all the observations dropped
NOTE: Invalid argument to function INPUT at line 1857 column 9.
PatID=1 start_time=28764 Fail_time=06/06/2009 time1=. time2=. time3=. time4=. time5=. time6=.
time7=. time8=. time9=. time10=. time11=. time12=. time13=. time14=. start_t=. date0=. date1=.
_I_=. time15=. time16=. time17=. time18=. time19=. time20=. i=1 _ERROR_=1 _N_=1.
please how do i fix this?
I'm confused. The variables shown are not the same ones shown in your proc contents' result.
One possibility, other than the above discrepancy, is that the file was imported as mixed variable type. If some of the start_dates are actually dates shown as numbers but in a character field, the following might work:
data have;
infile cards truncover;
input pat_id start_time $14. (fail_time time1-time13 time14__e) (:mmddyy10.);
format fail_time time1--time14__e mmddyy10.;
cards;
1 7690 1/20/2009 3/4/1999 4/8/2000 8/18/2003
2 14684 5/1/2012 2/3/2002 3/9/2008
3 1/20/1981 1/20/2009 3/4/1999 4/8/2000 8/18/2003
4 3/15/2000 5/1/2012 2/3/2002 3/9/2008
;
data want (keep=pat_id date0 date1);
set have;
if input(left(start_time), ?? mmddyy10.) gt 0 then
start_t=input(left(start_time),mmddyy10.);
else start_t=input(start_time,5.);
format date0 date1 mmddyy10.;
array t start_t time1--time14__e;
do i=1 to dim(t) while (not missing(t(i)));
date0=t(i);
date1=ifn(missing(t(i+1)),fail_time,t(i+1));
output;
end;
run;
Otherwise, we would really have to see a couple of the actual records.
Hello Arthur,
It works but I noticed that a date that was supposed to be 1978 was output as 2038 ie 60 years were added to the "start_time date". Don't know why. If it is going to be too much trouble then I will have to run with it. The worse case scenario is I will have to correct the date manually. Though I have more than 60,000 patid.
Thanks a lot.
That might confirm my original thought that the data had been imported from Excel.
You can get the correct result by running the code as:
data want (keep=pat_id date0 date1);
set have;
if input(left(start_time), ?? mmddyy10.) gt 0 then
start_t=input(left(start_time),mmddyy10.);
else start_t=input(start_time,5.)-21916;
format date0 date1 mmddyy10.;
array t start_t time1--time14__e;
do i=1 to dim(t) while (not missing(t(i)));
date0=t(i);
date1=ifn(missing(t(i+1)),fail_time,t(i+1));
output;
end;
run;
The correction, of subtracting 21916 from the numeric dates, is explained in the following paper:
Hello Arthur,
that's perfect. I am really grateful for you help and time. also my thanks goes to Haikuo (pardon me I have to mention your name again) , Linlin and PG for your assistance as well. Thanks all and do have a great evening.
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.