Hi - I hope all is well & thank you in advance for your guidance!
Looking at Transfer Delays, we are using time variables. Sharing just 2 pieces of code that I thought were relevant.
data name2;
set name1 ;
IF MISSING(ED_HOSPITAL_ARRIVAL_TIME ) OR MISSING(ED_DISCHARGE_TIME)THEN RIGHT_HOURS = .M;
ELSE IF ED_HOSPITAL_ARRIVAL_TIME = ED_DISCHARGE_TIME THEN RIGHT_HOURS = 0;
ELSE RIGHT_HOURS= mod(24+(ED_DISCHARGE_TIME-ED_HOSPITAL_ARRIVAL_TIME)/3600,24);
LABEL RIGHT_HOURS = "Hours (custom): Minutes Elapsed from ED_HOSPITAL_ARRIVAL_TIME to ED_DISCHARGE_TIME";
Run;
Note that Right_hours is numeric and in a frequency it looks like this:
RIGHT_HOURS | Frequency |
M | 2 |
1.1 | 1 |
1.5333333333 | 1 |
1.6666666667 | 1 |
1.7333333333 | 1 |
2.8166666667 | 1 |
2.9333333333 | 1 |
)
Data name3;
set name2 ;
Length New_time $10.;
if RIGHT_HOURS = .M or RIGHT_HOURS = ' ' then New_Time = 'Missing';
if '0.0000'<=RIGHT_HOURS <'2.0000' then New_Time = 'LT2' ;
if '2.0000'<=RIGHT_HOURS then New_Time = '> 2+';
Run;
(Note:
LT2 (less than 2 hours)
> 2 + is greater than 2 hours. )
I need assistance with the next step, which is to calculate median delay by age. I know how to calculate the median, but I am stuck because if I use the New_Time variable, it will not work since it’s character.
How should I approach this analysis from your point of view.
Please let me know if I need to provide additional information.
Thank you!
Linda
If you want to calculate a MEDIAN you need a number.
So convert the character strings in RIGHT_HOURS into a number.
num_hours=input(right_hours,32.);
Also you should only compare the character variable to character values, not numeric values like .M.
So your first IF needs to be something like this instead:
if RIGHT_HOURS in ('.M' 'M' ' ') then New_Time = 'Missing';
If you want to calculate a MEDIAN you need a number.
So convert the character strings in RIGHT_HOURS into a number.
num_hours=input(right_hours,32.);
Also you should only compare the character variable to character values, not numeric values like .M.
So your first IF needs to be something like this instead:
if RIGHT_HOURS in ('.M' 'M' ' ') then New_Time = 'Missing';
Agreed Right_Hours is already numeric, so you could do just this:
proc means data = name2 median;
var Right_Hours;
run;
You said you wanted age also included but how - by age band or something else?
You topic sentence said you have a character variable. Your code is assuming RIGHT_HOURS is character because it is comparing the value to character strings.
If that is not your question the please restate it.
Also pay close attention to the NOTEs that the SAS data step will generate when it has to convert numbers to character strings or the reverse.
Not sure what you mean by "decimal". SAS datasets only store two types of variables. Fixed length character strings and binary floating point numbers.
This type of recoding:
if RIGHT_HOURS = .M or RIGHT_HOURS = ' ' then New_Time = 'Missing'; if '0.0000'<=RIGHT_HOURS <'2.0000' then New_Time = 'LT2' ; if '2.0000'<=RIGHT_HOURS then New_Time = '> 2+';
IF right_hours is numeric (the .M as a special missing makes be believe that it is) often is not needed.
When you want to process groups of values of a variable then a custom format is often preferable as it doesn't need to modify any data set, which may take time, and can be quite flexible by having multiple formats or modifying the format definition.
An example you might be able to run:
proc format; value right_hours 0 -< 2 = 'LT2' 2 - high = '>2+' .,.M = 'Missing' ; run; proc freq data=name2; table right_hours /missing; format right_hours right_hours. ; run;
If your name2 data set has trillions of records then the data storage space and time to run to create the data set name3 could be prohibitive. Custom formats like this will be honored by analysis and reporting procedures for just about any grouping purpose and usually for graphing as well with some caveats.
Advantages to formats include:
typically easier coding than multiple If/then/else statements
easier to understand once familiar
ease of modification
with numeric values the formatted values will appear in value order
reduces the number of needed variables when one is based solely on another single variable
one format can be applied to multiple variables that need the same rules
lists of specific values may be much easier to code
and for advanced purposes a lookup data set may be used to create the formats
Consider a data set with person ages. I used to have to report on 5-year based age groups, 10-year based age groups, ages like 18 to 25, 26 to 40, 41 to 65 and 65+ (and a few other groups based on specific projects). I did not need a specific variable for each but had separate formats used with which ever age variable was appropriate for the data set such as age at marriage, treatment, enrollment or departure from program.
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and save with the early bird rate—just $795!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.