BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
robm
Quartz | Level 8

how do I extract year out of the below data ...sbstr gives this error

ERROR: Function SUBSTR requires a character expression as argument 1.

D_O_ADMN_DT
25JAN1932:00:00:00.000
17NOV1971:00:00:00.000
29OCT1975:00:00:00.000
29OCT1975:00:00:00.000
04MAR1932:00:00:00.000


2     proc sql ;
3    SELECT substr(c.D_O_ADMN_DT,6,4) from sasdata.casedata c;
ERROR: Function SUBSTR requires a character expression as argument 1.

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

Working way too hard there.

input(tranwrd(p.P_D_O_LAST_UPDATE," ","0"),MMDDYY8.) format yymmdd10. as patLastUpd

,input(tranwrd(p.P_D_O_DEATH      ," ","0"),MMDDYY8.) format yymmdd10. as DO_DEATH

,input(tranwrd(c.C_D_O_ADMN       ," ","0"),MMDDYY8.) format yymmdd10. as d_o_admt


View solution in original post

22 REPLIES 22
Patrick
Opal | Level 21

A SAS datetime value is stored in a numeric variable and represented as the number of seconds since 1/1/1960. You then apply a SAS format to make this number human readable. The format as such does not change the internal value but how it's printed.

Below code illustrates the principle:

data have;
dt=datetime();
run;

proc sql;
select
  dt format=best16.,
  dt format=datetime21.,
  dt format=dtyear4.,

  /* applies the format, then stores the resulting string in the new variable "year" */
  put(dt,dtyear4.) as year
from have
;
quit;

robm
Quartz | Level 8

shouldn't this work then?

4     proc sql ;

5    SELECT year(c.D_O_ADMN_DT) from sasdata.casedata c;

NOTE: Writing HTML Body file: sashtml.htm

NOTE: Invalid argument to function YEAR. Missing values may be generated

CTorres
Quartz | Level 8

Try this:

proc sql ;

SELECT year(datepart(c.D_O_ADMN_DT)) from sasdata.casedata c;

Regards,

robm
Quartz | Level 8

thanks CTorres

year(datepart(c.D_O_ADMN_DT))>=&startDate. and year(datepart(c.D_O_ADMN_DT))<=&endDate. then 'Y'

seems to give

ERROR: Character expression requires a character format.

ERROR: Character expression requires a character format.

ERROR: Character expression requires a character format.

967

968    /* Mark case records where the patient has a case between the start/endDates */

969   proc sql;

970       create table a.&ir._tab01 as

971       select p.P_ACTIVE_CLINIC,

972           p.P_PATIENT_NUMBER, c.C_CASE_NUMBER, p.P_D_O_LAST_UPDATE format yymmdd10. as patLastUpd,

973           p.P_D_O_DEATH format yymmdd10., c.C_D_O_ADMN format yymmdd10.,

974           p.P_POSTAL_CODE, c.C_POSTAL_CODE, c.C_RES_PROV_CODE, p.P_PROVINCE format = $provid.,

975           c.C_RESIDENCE_CODE,

976           substr(c.C_PATIENT_NUMBER,6,2) || substr(c.C_PATIENT_NUMBER,4,2) ||

976 ! substr(c.C_PATIENT_NUMBER,2,2) || substr(c.C_PATIENT_NUMBER,1,1) as sortseq,

977

978           case

979           when year(datepart(c.D_O_ADMN_DT))>=&startDate. and

979 ! year(datepart(c.D_O_ADMN_DT))<=&endDate. then 'Y'

980           else ' '

981           end as insideRange

982

983       from sasdata.casedata c, sasdata.patient p

984       where c.C_PATIENT_NUMBER = p.P_PATIENT_NUMBER

985         and c.C_ADMISSION_TYPE = '12'

986   ;

ERROR: Character expression requires a character format.

ERROR: Character expression requires a character format.

ERROR: Character expression requires a character format.

987   run;

hey that's great thanks Torres

CTorres
Quartz | Level 8

Three comments:

1. For your program to work correctly the macrovariables &startDate and &endDate should contain a 4 digit year, not a full date.

2. I think the expression "year(datepart(c.D_O_ADMN_DT))>=&startDate. and year(datepart(c.D_O_ADMN_DT))<=&endDate. then 'Y'"  is not the one causing the three errors.

3. I would guess the formats you are applying to the variables "p.P_D_O_LAST_UPDATE format yymmdd10. as patLastUpd, p.P_D_O_DEATH format yymmdd10., c.C_D_O_ADMN format yymmdd10.," are those causing the errors. Check if they are character.

Good luck

robm
Quartz | Level 8

Hi C Smiley Happy

yup &startDate and &endDate are a 4 digit year # assigned like this

%let startDate = 2007;

%let endDate = 2009 ;

here is what I get from

proc sql;

select p.P_D_O_LAST_UPDATE  from sasdata.casedata c, sasdata.patient p

where c.C_PATIENT_NUMBER = p.P_PATIENT_NUMBER

   and c.C_ADMISSION_TYPE ^= '12'

;

so it looks like a date

199511 8
199511 8
199511 8
199511 8
199511 2
199511 2
CTorres
Quartz | Level 8

Yes, they look like a date but Character or Numeric?

Try this to get the attributes of the variable .P_D_O_LAST_UPDATE :

proc sql ;

  select name, type, length, label, format, informat

  from sashelp.vcolumn

  where libname = 'SASDATA'

  and memname = 'PATIENT'

  and name = 'P_D_O_LAST_UPDATE'

  ;

quit;

Regards,

robm
Quartz | Level 8

looks like its a string ....

P_D_O_LAST_UPDATE char 8 P_D_O_LAST_UPDATE $8. $8.

robm
Quartz | Level 8

ok here is what I ended up doing since it was an oddly formatted string... date '1966 6 7' is how my birthday would be formatted so i had to  take apart the string substitute ' ' with '0' and then cat them

back together into 1966/06/07 which I could then put into SAS date format and then reformat to yymmdd10 (<-thats an extra step I know but its working) there hopefully is a simpler way of converting this goofy data but thats how I fixed it.

input(trim(tranwrd(substr(p.P_D_O_LAST_UPDATE,5,2)," ","0") || "/" || tranwrd(substr(p.P_D_O_LAST_UPDATE,7,2)," ","0")  || "/" || substr(p.P_D_O_LAST_UPDATE,1,4)),MMDDYY10.) format yymmdd10. as patLastUpd,

input(trim(tranwrd(substr(p.P_D_O_DEATH,5,2)," ","0") || "/" || tranwrd(substr(p.P_D_O_DEATH,7,2)," ","0")  || "/" || substr(p.P_D_O_DEATH,1,4)),MMDDYY10.) format yymmdd10. as DO_DEATH,

input(trim(tranwrd(substr(c.C_D_O_ADMN,5,2)," ","0") || "/" || tranwrd(substr(c.C_D_O_ADMN,7,2)," ","0")  || "/" || substr(c.C_D_O_ADMN,1,4)),MMDDYY10.) format yymmdd10. as d_o_admt,

Tom
Super User Tom
Super User

Working way too hard there.

input(tranwrd(p.P_D_O_LAST_UPDATE," ","0"),MMDDYY8.) format yymmdd10. as patLastUpd

,input(tranwrd(p.P_D_O_DEATH      ," ","0"),MMDDYY8.) format yymmdd10. as DO_DEATH

,input(tranwrd(c.C_D_O_ADMN       ," ","0"),MMDDYY8.) format yymmdd10. as d_o_admt


robm
Quartz | Level 8

cool thanks that will be more compact for sure Smiley Happy

Peter_C
Rhodochrosite | Level 12

When you see

year(datepart(c.D_O_ADMN_DT))>=&startDate. and

year(datepart(c.D_O_ADMN_DT))<=&endDate. then 'Y'

seems to give

ERROR: Character expression requires a character format.

try handling the year as a string:

put(c.D_O_ADMN_DT, dtyear4.)>="&startDate." and

put(c.D_O_ADMN_DT, dtyear4.)<="&endDate." then 'Y'

of course, you could half the process time for the put() conversion by changing your WHEN clause to:

when put(c.D_O_ADMN_DT, dtyear4.)

    between "&startYEAR." and "&endYEAR." then 'Y'

this depends on &startYEAR and &endYEAR being 4digit year strings and column D_O_ADMN_DT being a standard SAS numeric datetime value

  Message was edited by: Peter Crawford reviewing this - more questions surface ----->

why would you SUBSTR( patient_NUMBER     (if that is a numeric number ) ?

if that is a number, you first need to convert it to a string with the usual

PUT( ) substr( PUT( PATIENT_NUMBER, {suitable format}

However, what format should be used?

Have all patient_number the same number of digits - should there be leading zeroes or should the conversion suppress leading zeroes?

Is there already a special format for patient_number?

robm
Quartz | Level 8

now i get

ERROR: Character expression requires a character format.

ERROR: Character expression requires a character format.

ERROR: Character expression requires a character format.

ERROR: Expression using greater than or equal (>=) has components that are of different data types.

ERROR: Expression using less than or equal (<=) has components that are of different data types.

Tom
Super User Tom
Super User

The problem is not with your datetime variable D_O_ADMN_DT, bu with the character variables P_D_O_LAST_UPDATE, P_D_O_DEATH and C_D_O_ADMN.  The error message appears three times because you are attempting to attach the numeric format YYMMDD10. to these three character variables.

You could leave them as character and remove the "format yymmdd10." parts of the query and the errors will go away.

If you want to convert these character strings into actual dates then you will need to do a little work as SAS will not like the embedded blanks that you showed in the example values for P_D_O_LAST_UPDATE.  To convert them to actual dates could do something like

input(translate(p.P_D_O_LAST_UPDATE,'0',' '),yymmdd8.) format yymmdd10. as patLastUpd,

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 22 replies
  • 29486 views
  • 8 likes
  • 7 in conversation