BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
PaulaC
Fluorite | Level 6

I have a dataset that has a diagnosis date variable.  This variable is a character variable.  I would like to use this variable to cacluate the duration since diagnosis (randomization date-diagnosis date).  Unfortunately the variable is stored in three different way for the subjects.  I have about 40 subjects with only a year, 50 subjects with just a month and year and the remainder have a month, day and year.  I want to split the dataset by the way the diagnosis date is reported and then impute the date where the missing information is present.  After the imputation, I am planning on merging all the data back together.

 

The data is currently as yyyy(40 subjects), mmmyyyy (50 subjects) and ddmmmyyyy(remainder)

 

I am expecting to have three datasets after the split.

 

I have used the following code in SAS 9.4 and it did not work (error message also included):

113 data diagfix1;
114 set disease;
115 if diagdate=diagdate year4.;
                                   ------
                                  388
                                  201
                                  76
ERROR 388-185: Expecting an arithmetic operator.

ERROR 201-322: The option is not recognized and will be ignored.

ERROR 76-322: Syntax error, statement will be ignored.

116 run;

 

Any help on how to split this data based on the date format would be appreciated.

 

Thanks.

 

1 ACCEPTED SOLUTION

Accepted Solutions
Shmuel
Garnet | Level 18

Pay attention:

 

 @art297 added the next line to the code, just before the select statement:

date02=strip(date02);

 You can use also function compress() or left() instead strip.

 

 

View solution in original post

21 REPLIES 21
art297
Opal | Level 21

It would help to see some example data. It appears that all 3 types are in the same variable. Are they SAS dates or character?

 

Art, CEO, AnalystFinder.com

 

PaulaC
Fluorite | Level 6
The variable diagdate is a character variable (as mentioned in the original post). Some examples of the data are as follows:
1963
1964
1970
1972
198
1980
1981
APR1993
DEC1991
FEB1995
JUL1980
MAR1993
MAY1995
NOV1973
NOV1994
OCT1979
01APR1993
01APR1997
01AUG1991

Thanks.
Shmuel
Garnet | Level 18

You can use next code:

    len = length(strip(date_var));
   select (len);
      when (4) then date = mdy(01,01,input(date_var,4.));
      when (7) then date = input('01'||date_var, date9.);
      when (9) then date = input(date_var,date9.);
      otherwise put 'Check obs ' _N_ date_var=;
  end;

then continue with DATE as sas date variable.

 

PaulaC
Fluorite | Level 6
Thank you. I am trying this now. Would you mind explaining what this code is doing so that I will know for the future? Will I be getting three separate datasets?
Shmuel
Garnet | Level 18

In 

len = length(strip(date_var));

I'm counting number of characters in the input date variable.

if variable contains year only then its length is 4.

if it contains month and tear its length is 7.

if its a full date in a format of ddmmmyyyy then its length is 9.

 

for each kind of input I fill the missing part as day=01 and if need month=JAN;

finally convert it to sas date variable.

PaulaC
Fluorite | Level 6
I have a question for you regarding the length of 7 putting the character "01" since only the month and year were provided. I ended up with a diagnosis date of 2019 and a duration of -22, but not sure why. When I made it the character "01" a numeric 01, the value became missing. Do you have any explanations for this?
Shmuel
Garnet | Level 18
 when (7) then date = input('01'||date_var, date9.);

informat date9. accepts input as DDMMMYYYY

where DD is the day (as number) of the month.

I have entered  01 in order to have a valid date forma (that is the 1st day in the month)

 

date_var is a character type field thefore I concatenate '01' as character.

You can't concatenate number to char, that is the reason of getting missing value.

Have you checked you log ?

 

 

PaulaC
Fluorite | Level 6
Thank you for the explanation. I don't recall seeing anything in the log, but I will check again. Doesn't the input command change the variable from character to numeric?
PaulaC
Fluorite | Level 6
I just checked the log and it did not give me any error messages. When I looked at the diagnosis date that was created, for 50 of the patients it is giving me a diagnosis date of 01mmm2019. Why 2019? I end up with a negative time from diagnosis. The years provided in the date variable range from 1973-1995. I am not sure why the code gives a year of 2019 for these year ranges.
Shmuel
Garnet | Level 18

Its difficult to know why you got year 2019.

please post again your full code + sample of input rows that make you truble.

PaulaC
Fluorite | Level 6

Please find the output with the 2019 diagnosis date as well as the negative diagnosis time in the attached excel document.  Other than the diagdate and diagtime, the other variables are the input variables with mock data.  The code I used is below.

data merge;
merge bchist demog;
by subject;
where strip(trt) ne "001";
len = length(strip(date02));
select (len);
when (4) diagdate = mdy(06,15,input(date02,4.));
when (7) diagdate = input("15"||date02, date9.);
when (9) diagdate = input(date02,date9.);
otherwise put 'Check obs ' _N_ date02=;
end;
format diagdate date9.;
diagtime=intck('year',diagdate,daterand);
run;

 

Please let me know if you need any further information.  Thanks for your help.

art297
Opal | Level 21

@PaulaC: You have embedded spaces in your data. The following should correct for that:

 

data year mmyyyy mdy;
  input date02 $12.;
  format diagdate date9.;
  date02=strip(date02);
  select (length(strip(date02)));
      when (4) do;
                 diagdate = mdy(06,05,input(date02,4.));
                 output year;
               end;
      when (7) do;
                 diagdate = input('15'||date02, date9.);
                 output mmyyyy;
               end;
      when (9) do;
                 diagdate = input(date02,date9.);
                 output mdy;
               end;
      otherwise put 'Check obs ' _N_ date02=;
  end;
  cards;
1963
1964
1970
1972
198
1980
1981
APR1993
DEC1991
FEB1995
JUL1980
MAR1993
MAY1995
NOV1973
NOV1994
OCT1979
01APR1993
01APR1997
01AUG1991
  OCT1991
  OCT1991
  OCT1991
  OCT1991
  NOV1979
  NOV1979
  NOV1979
  NOV1979
  NOV1979
  NOV1979
  JAN1993
  JAN1993
  JAN1993
  JAN1993
  JUN1980
  JUN1980
  JUN1980
  JUN1980
  MAR1995
  MAR1995
  MAR1995
  MAR1995
  MAR1995
  MAR1995
  MAR1995
 DEC1973
 DEC1973
 DEC1973
 DEC1973
 DEC1973
 DEC1973
 DEC1973
 MAY1993
 MAY1993
 MAY1993
 MAY1993
 MAY1993
  MAR1995
  MAR1995
  MAR1995
  MAR1995
  MAR1995
  MAR1995
  MAR1995
  MAR1995
  MAR1995
  OCT1994
  OCT1994
  OCT1994
  OCT1994
;

HTH,

Art, CEO, AnalystFinder.com

 

Shmuel
Garnet | Level 18

Pay attention:

 

 @art297 added the next line to the code, just before the select statement:

date02=strip(date02);

 You can use also function compress() or left() instead strip.

 

 

PaulaC
Fluorite | Level 6
thank you for your help.

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 21 replies
  • 5742 views
  • 2 likes
  • 3 in conversation