Hello Arthur,
Trust you had a great weekend. Please am back with my troublesome data. During data cleansing some patid were dropped and the new dataset has the infor below. I ran same code you gave me last weekend and I realized that the output started from patid 167 and dropped patid less than 167. I dont know why. I also observed that a start_time that was supposed to be 1994 was output as "2003" even after I removed the 21916 in the code. Please any idea what the problem might be and how do I fix it permanently because of over I will be running this code on similar data set?
below is the data description and code.
# Variable Type Len Format Label
3 Fail_time Num 8 MMDDYY10. Fail_time
1 Pat_ID Num 8 BEST12. Pat_ID
2 start_time Num 8 MMDDYY10. start_time
4 time1 Num 8 MMDDYY10. time1
5 time2 Num 8 MMDDYY10. time2
6 time3 Num 8 MMDDYY10. time3
7 time4 Num 8 MMDDYY10. time4
8 time5 Num 8 MMDDYY10. time5
9 time6 Num 8 MMDDYY10. time6
10 time7 Num 8 MMDDYY10. time7
11 time8 Num 8 MMDDYY10. time8
12 time9 Num 8 MMDDYY10. time9
13 time10 Num 8 MMDDYY10. time10
14 time11 Num 8 MMDDYY10. time11
15 time12 Num 8 MMDDYY10. time12
16 time13 Num 8 MMDDYY10. time13
17 time14 Num 8 MMDDYY10. time14
The SAS System 20:15 Tuesday, August 7, 2012 2
The CONTENTS Procedure
Sort Information
Sortedby Pat_id
Validated YES
Character Set ANSI
The SAS System 20:15 Tuesday, August 7, 2012 3
Obs PAT_ID start_time Fail_time time1 time2 time3
1 10 11/19/2001 08/01/2011 . . .
2 11 11/19/2001 08/01/2011 . . .
3 12 11/19/2001 08/01/2011 12/01/2006 06/17/2009 02/24/2011
4 13 11/19/2001 08/01/2011 . . .
5 14 11/19/2001 08/01/2011 . . .
Obs time4 time5 time6 time7 time8 time9
1 . . . . . .
2 . . . . . .
3 . . . . . .
4 . . . . . .
5 . . . . . .
Obs time10 time11 time12 time13 time14
1 . . . . .
2 . . . . .
3 . . . . .
4 . . . . .
5 . . . . .
data data4 (keep=pat_id date0 date1);
set dataset;
if input(left(start_time), ?? mmddyy10.) gt 0 then
start_t=input(left(start_time),mmddyy10.);
else start_t=input(start_time,5.)-21916;
format date0 date1 mmddyy10.;
array t start_t time1-time14;
do i=1 to dim(t) while (not missing(t(i)));
date0=t(i);
date1=ifn(missing(t(i+1)),fail_time,t(i+1));
output;
end;
run;
Without seeing the original data, and how it was imported, all any of us can do is guess!
My guess would be that some columns were imported as dates but, due to having character formats, were set to missing. Did you use the mixed=yes option when you imported the file?
It is also possible that some of the cells contained datetime values rather than dates and they, too, may have come across as character rather than numeric.
I used the GETNAMES=YES option
You indicated that the first x rows end up with missing values after running the code. I would suggest doing a proc print including, say, the first five or ten of those records directly from the file created right after the import.
What kind of values show up for the fields that are ending up missing after running the new code?
Thanks Arthur, I have been able to sort it out. Using the previous code without the correcting for the date format. I formatted the dates before importing it into sas and when I ran the code it came out just fine.
Hi All,
I have a similar need and would require some help with the below requirement .
I have :
ID TmpShoppingCart_ID StoreSKU_ID QuantityΞΞ Enabled 26 34 448 2 True 27 34 3465 4 True 28 34 3468 1 True
Want:
ID TmpShoppingCart_ID StoreSKU_ID QuantityΞΞ Enabled 26 34 448 1 True 26 34 448 1 True 27 34 3465 1 True 27 34 3465 1 True 27 34 3465 1 True 27 34 3465 1 True 28 34 3468 1 True
Appreciate the any help in this regard.
Hi,
I have similar requirement in SQL. Can anyone please suggest how can I get the same solution using SQL?
Thanks in advance!
It ran but it dropped the first 53 observations. with the following error message for all the observations dropped
NOTE: Invalid argument to function INPUT at line 1857 column 9.
PatID=1 start_time=28764 Fail_time=06/06/2009 time1=. time2=. time3=. time4=. time5=. time6=.
time7=. time8=. time9=. time10=. time11=. time12=. time13=. time14=. start_t=. date0=. date1=.
_I_=. time15=. time16=. time17=. time18=. time19=. time20=. i=1 _ERROR_=1 _N_=1.
Hello Arthur,
I think I misunderstood you. I have posted the result above. Hope to read from you soon. Thanks in advance.
data have;
informat start_time failed_time time1 time2 time3 mmddyy10.;
format start_time failed_time time1 time2 time3 mmddyy10.;
input patid start_time failed_time time1 time2 time3 ;
cards;
1 1/20/1981 1/20/2009 3/4/1999 4/8/2000 8/18/2003
2 3/15/2000 5/1/2012 2/3/2002 3/9/2008 .
;
data temp(keep=patid date0);
set have;
array _time(*) start_time time: failed_time;
do i=1 to dim(_time);
date0=_time(i);
if date0 ne . then output;
end;
format date0 mmddyy10.;
run;
data want(drop=nid);
set temp;
set temp(firstobs=2 rename=(date0=date1 patid=nid));
if patid=nid;
proc print;run;
Hello Linlin;
I read up some doc and used the OPTIONS nofmterr; in reading in my data file. After which I ran
the codes below;
OPTIONS nofmterr;
PROC IMPORT OUT= WORK.dataset1 DATAFILE= "C:\mydata.xls"
DBMS=xls REPLACE;
SHEET="dataset1";
GETNAMES=YES;
RUN;
data a;
set dataset1;
start_time=input(start_time,$5.);
format start_time $mmddyy10.;
run;
proc sort data=a;
by patid;
run;
data temp(keep=patid date0 date1);
set dataset_test2;
array _time(*) start_time time: fail_time;
do i=1 to dim(_time);
date0=_time(i);
if date0 ne . then output;
end;
format date0 mmddyy10.;
run;
data want(drop=nid);
set temp;
set temp(firstobs=2 rename=(date0=date1 patid=nid));
if patid=nid;
proc print;run;
It seems to run well but I noticed that it omitted the last observations (line) for each and it also drops patid with just start_time and fail_time (guess i failed to mention that some patid have just the start_time and fail_time and no value in the time1-time14 period ). ie
I got some a result like this;
patid date0 date1
1 3/4/1999 4/8/2000
1 4/8/2000 8/18/2003
1 8/18/2003 1/20/2009
2 2/3/2002 3/9/2008
2 3/9/2008 5/1/2012.
instead of
patid date0 date1
1 1/20/1981 3/4/1999 (this line was omitted in the output)
1 3/4/1999 4/8/2000
1 4/8/2000 8/18/2003
1 8/18/2003 1/20/2009
2 3/15/2000 2/3/2002 (this line was also omitted in the output.)
2 2/3/2002 3/9/2008
2 3/9/2008 5/1/2012.
for all observations the first start_ time and time1 row was omitted (in the temp file when date0 was created the start_time for each patid was completely omitted). I guess it has to do with the time format.
Please what can I do to get it right?
Thanks in advance.
@ Kaiuto,
I tried yours as well but I still got the error message about the array being of different variable as posted earlier.
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.