BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
woodcook
Calcite | Level 5

Hello Arthur,

Trust you had a great weekend. Please am back with my troublesome data. During data cleansing some patid were dropped and the new dataset has the infor below. I ran same code you gave me last weekend and I realized that the output started from patid 167 and dropped patid less than 167. I dont know why. I also observed that a start_time that was supposed to be 1994 was output as "2003" even after I removed the 21916 in the code. Please any idea what the problem might be and how do I fix it permanently because of over I will be running this code on similar data set?

below is the data description and code.

#    Variable      Type    Len    Format       Label

                     3    Fail_time     Num       8    MMDDYY10.    Fail_time

                     1    Pat_ID        Num       8    BEST12.      Pat_ID

                     2    start_time    Num       8    MMDDYY10.    start_time

                     4    time1         Num       8    MMDDYY10.    time1

                     5    time2         Num       8    MMDDYY10.    time2

                     6    time3         Num       8    MMDDYY10.    time3

                     7    time4         Num       8    MMDDYY10.    time4

                     8    time5         Num       8    MMDDYY10.    time5

                     9    time6         Num       8    MMDDYY10.    time6

                    10    time7         Num       8    MMDDYY10.    time7

                    11    time8         Num       8    MMDDYY10.    time8

                    12    time9         Num       8    MMDDYY10.    time9

                    13    time10        Num       8    MMDDYY10.    time10

                    14    time11        Num       8    MMDDYY10.    time11

                    15    time12        Num       8    MMDDYY10.    time12

                    16    time13        Num       8    MMDDYY10.    time13

                    17    time14        Num       8    MMDDYY10.    time14

                                          The SAS System          20:15 Tuesday, August 7, 2012   2

                                      The CONTENTS Procedure

                                         Sort Information

                                       Sortedby       Pat_id

                                       Validated      YES

                                       Character Set  ANSI

                                          The SAS System          20:15 Tuesday, August 7, 2012   3

      Obs          PAT_ID    start_time     Fail_time         time1         time2         time3

        1              10    11/19/2001    08/01/2011             .             .             .

        2              11    11/19/2001    08/01/2011             .             .             .

        3              12    11/19/2001    08/01/2011    12/01/2006    06/17/2009    02/24/2011

        4              13    11/19/2001    08/01/2011             .             .             .

        5              14    11/19/2001    08/01/2011             .             .             .

      Obs         time4         time5         time6         time7         time8         time9

        1             .             .             .             .             .             .

        2             .             .             .             .             .             .

        3             .             .             .             .             .             .

        4             .             .             .             .             .             .

        5             .             .             .             .             .             .

      Obs        time10        time11        time12        time13        time14

        1             .             .             .             .             .

        2             .             .             .             .             .

        3             .             .             .             .             .

        4             .             .             .             .             .

        5             .             .             .             .             .

data data4 (keep=pat_id date0 date1);

  set dataset;

  if input(left(start_time), ?? mmddyy10.) gt 0 then

   start_t=input(left(start_time),mmddyy10.);

  else start_t=input(start_time,5.)-21916;

  format date0 date1 mmddyy10.;

  array t start_t time1-time14;

      do i=1 to dim(t) while (not missing(t(i)));

          date0=t(i);

          date1=ifn(missing(t(i+1)),fail_time,t(i+1));

          output;

       end;

run;

art297
Opal | Level 21

Without seeing the original data, and how it was imported, all any of us can do is guess!

My guess would be that some columns were imported as dates but, due to having character formats, were set to missing.  Did you use the mixed=yes option when you imported the file?

It is also possible that some of the cells contained datetime values rather than dates and they, too, may have come across as character rather than numeric.

woodcook
Calcite | Level 5

I used the GETNAMES=YES option

art297
Opal | Level 21

You indicated that the first x rows end up with missing values after running the code.  I would suggest doing a proc print including, say, the first five or ten of those records directly from the file created right after the import.

What kind of values show up for the fields that are ending up missing after running the new code?

woodcook
Calcite | Level 5

Thanks Arthur, I have been able to sort it out. Using the previous code without the correcting for the date format. I formatted the dates before importing it into sas and when I ran the code it came out just fine.

NM1
Calcite | Level 5 NM1
Calcite | Level 5

Hi All,

 

I have a similar need and would require some help with the below requirement .

I have :

ID TmpShoppingCart_ID StoreSKU_ID QuantityΞΞ Enabled 26 34                 448         2          True 27 34                 3465        4          True 28 34                 3468        1          True 

 

Want:

 

ID TmpShoppingCart_ID StoreSKU_ID QuantityΞΞ Enabled 26 34                 448         1          True 26 34                 448         1          True 27 34                 3465        1          True 27 34                 3465        1          True 27 34                 3465        1          True 27 34                 3465        1          True 28 34                 3468        1          True 

 

Appreciate the any help in this regard.

Ravs
Calcite | Level 5

Hi,

I have similar requirement in SQL. Can anyone please suggest how can I get the same solution using SQL?

Thanks in advance!

woodcook
Calcite | Level 5

It ran but it dropped the first 53 observations. with the following error message for all the observations dropped

NOTE: Invalid argument to function INPUT at line 1857 column 9.

PatID=1 start_time=28764 Fail_time=06/06/2009 time1=. time2=. time3=. time4=. time5=. time6=.

time7=. time8=. time9=. time10=. time11=. time12=. time13=. time14=. start_t=. date0=. date1=.

_I_=. time15=. time16=. time17=. time18=. time19=. time20=. i=1 _ERROR_=1 _N_=1.

woodcook
Calcite | Level 5

Hello Arthur,

I think I misunderstood you. I have posted the result above. Hope to read from you soon. Thanks in advance.

Linlin
Lapis Lazuli | Level 10

data have;

informat start_time failed_time     time1      time2            time3 mmddyy10.;

format start_time failed_time     time1      time2            time3 mmddyy10.;

input patid    start_time     failed_time     time1      time2            time3 ;

cards;

1          1/20/1981     1/20/2009     3/4/1999    4/8/2000     8/18/2003  

2           3/15/2000     5/1/2012      2/3/2002    3/9/2008        . 

;

data temp(keep=patid date0);

  set have;

  array _time(*) start_time time: failed_time;

  do i=1 to dim(_time);

  date0=_time(i);

  if date0 ne . then output;

  end;

  format date0 mmddyy10.;

  run;

  data want(drop=nid);

   set temp;

   set temp(firstobs=2 rename=(date0=date1 patid=nid));

  if patid=nid;

proc print;run;

woodcook
Calcite | Level 5

Hello Linlin;

I read up some doc and used the OPTIONS nofmterr; in reading in my data file. After which I ran

the codes below;

OPTIONS nofmterr;

PROC IMPORT OUT= WORK.dataset1 DATAFILE= "C:\mydata.xls"

            DBMS=xls REPLACE;

     SHEET="dataset1";

     GETNAMES=YES;

RUN;

data a;

set dataset1;

      start_time=input(start_time,$5.);

   format start_time $mmddyy10.;

      run;

proc sort data=a;

by patid;

run;

data temp(keep=patid date0 date1);

  set dataset_test2;

  array _time(*) start_time time: fail_time;

  do i=1 to dim(_time);

  date0=_time(i);

  if date0 ne . then output;

    end;

  format date0 mmddyy10.;

  run;

  data want(drop=nid);

   set temp;

   set temp(firstobs=2 rename=(date0=date1 patid=nid));

  if patid=nid;

proc print;run;

It seems to run well but I noticed that it omitted the last observations (line) for each and it also drops patid with just start_time and fail_time (guess i failed to mention that some patid have just the start_time and fail_time and no value in the time1-time14 period ). ie

I got some a result like this;

patid date0             date1

1       3/4/1999       4/8/2000

1      4/8/2000       8/18/2003

1     8/18/2003      1/20/2009

2   2/3/2002         3/9/2008

2   3/9/2008        5/1/2012.

instead of

patid date0             date1

1      1/20/1981       3/4/1999 (this line was omitted in the output)

1       3/4/1999       4/8/2000

1      4/8/2000       8/18/2003

1     8/18/2003      1/20/2009

2   3/15/2000        2/3/2002 (this line was also omitted in the output.)

2   2/3/2002         3/9/2008

2   3/9/2008        5/1/2012.

for all observations the first start_ time and time1 row was omitted (in the temp file when date0 was created the start_time for each patid was completely omitted). I guess it has to do with the time format.

Please what can I do to get it right?

Thanks in advance.

@ Kaiuto,

I tried yours as well but I still got the error message about the array being of different variable as posted earlier.

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 25 replies
  • 11136 views
  • 6 likes
  • 7 in conversation