Hello,
I am unable to produce the follow output below.
I tried to use an array for my variable because in the real data, I have R_Seizure_1 to R_Seizure_20.
Writing R_Seizure_1=. or R_Seizure_2=. or R_Seizure_3=. all the way to R_Seizure_20 is not efficient.
I want SAS to ignore the missing variable.
data have;
input ID $6. First_Ischemic First_Hemorrhagic R_Seizure_1 R_Seizure_2 R_Seizure_3 R_Seizure_4 ;
format First_Ischemic First_Hemorrhagic R_Seizure_1 R_Seizure_2 R_Seizure_3 R_Seizure_4 date9.;
informat First_Ischemic First_Hemorrhagic R_Seizure_1 R_Seizure_2 R_Seizure_3 R_Seizure_4 date9.;
datalines;
011396 23SEP2004 10FEB2004 . . . 11FEB2020
034627 01DEC2009 30NOV2009 . 10FEB2010 . .
011427 11SEP2010 09AUG2010 10SEP2010 03FEB2012 . .
012666 . 18SEP2006 20JUN2002 . . .
023434 . 18OCT2002 21JUN2003 . . .
020485 15JUL2019 . . . 15AUG2009 25JUL2010
032462 13AUG2014 . 12AUG2014 20JUN2002 . .
011386 23SEP2004 10FEB2020 . . . 12AUG2015
;
run;
proc sort data=have; by id; run;
data want;
set have;
by id;
array vars R_Seizure_1--R_Seizure_4;
if R_Seizure_1--R_Seizure_4=. or First_Ischemic=. or First_Hemorrhagic=. then n='N/A';
if R_Seizure_1--R_Seizure_4 ne . then do;
if R_Seizure_1--R_Seizure_4 > First_Ischemic or First_Hemorrhagic then n=1;
else n=0;
end;
run;
Expected output
ID | First_Ischemic | First_Hemorrhagic | R_Seizure_1 | R_Seizure_2 | R_Seizure_3 | R_Seizure_4 | n |
11386 | 23SEP2004 | 10FEB2020 | . | . | . | 12AUG2015 | no |
11396 | 23SEP2004 | 10FEB2004 | . | . | . | 11FEB2020 | yes |
11427 | 11SEP2010 | 09AUG2010 | 10SEP2010 | 03FEB2012 | . | . | yes |
12666 | . | 18SEP2006 | 20JUN2002 | . | . | . | no |
20485 | 15JUL2019 | . | . | . | 15AUG2009 | 25JUL2010 | no |
23434 | . | 18OCT2002 | 21JUN2003 | . | . | . | yes |
32462 | 13AUG2014 | . | 12AUG2014 | 20JUN2002 | . | . | no |
34627 | 01DEC2009 | 30NOV2009 | . | 10FEB2010 | . | . | yes |
Depending on whether or not you @CathyVI are correct or @Kurt_Bremser is correct, you want some variation of this command (note: no arrays needed)
if max (of r_seizure1-r_seizure4) < min (first_ischemic,first_herorrhagic) then n=1;
If that's not exactly what you want, then your homework assignment is to figure out what variation of the above code is correct for you.
Replace this
array vars R_Seizure_1--R_Seizure_4;
if R_Seizure_1--R_Seizure_4=. or First_Ischemic=. or First_Hemorrhagic=. then n='N/A';
if R_Seizure_1--R_Seizure_4 ne . then do;
if R_Seizure_1--R_Seizure_4 > First_Ischemic or First_Hemorrhagic then n=1;
else n=0;
with this
array vars R_Seizure_1--R_Seizure_4;
if n(of vars{*})=0 and First_Ischemic=. or First_Hemorrhagic=. then n='N/A';
if n(of vars{*})=4 then do;
/* if R_Seizure_1--R_Seizure_4 > First_Ischemic or First_Hemorrhagic then n=1; else n=0; */
Above, one line is commented out because I cannot figure out what you are trying to do (and in fact, I am guessing about the other lines as well). I'm pretty sure this has been requested from you in the past: please DESCRIBE what you are doing in words. DESCRIBE what each line is supposed to be doing. Do NOT make us guess what your incorrect code is supposed to be doing. Do this for every post in the future, as well as for this one.
@PaigeMiller Sorry. I wrote an extensive message but I made a little cut/paste mistake and I could not undo all the messages on this community note.
What I want is to find when
R_Seizure_1 or R_Seizure_1 or R_Seizure_3 or R_Seizure_4 is greater than First_Ischemic or First_Hemorrhagic date and make a variable called N to be 'Yes'
In the line you comment out, I want the N to indicate if any seizure date comes before the first stroke dates (ischemic or hemorrhagic) if this is true then seizure is a post_stroke (n=yes) but if seizure date comes after either stroke dates then the post_stroke (n=no).
@Tom I want the default nmiss(of R_Seizure_1--R_Seizure_4). as if I want sas to make ANY missing=0, I would loose alot of record to missing. Only when ALL of the seizure is miss is when SAS should consider output as missing.
@ballardw I want N to be a character variable
@CathyVI wrote:
@PaigeMiller Sorry. I wrote an extensive message but I made a little cut/paste mistake and I could not undo all the messages on this community note.
What I want is to find when
R_Seizure_1 or R_Seizure_1 or R_Seizure_3 or R_Seizure_4 is greater than First_Ischemic or First_Hemorrhagic date and make a variable called N to be 'Yes'
In the line you comment out, I want the N to indicate if any seizure date comes before the first stroke dates (ischemic or hemorrhagic) if this is true then seizure is a post_stroke (n=yes) but if seizure date comes after either stroke dates then the post_stroke (n=no).
You contradict yourself.
In the first sentence, you say you want "Yes" when any seizure comes after a stroke, but then you say at the end of the second sentence that it should be "No". Please make up your mind.
@Kurt_BremserThis is a typo...
In the line you comment out, I want the N to indicate if any seizure date comes before the first stroke dates (ischemic or hemorrhagic) if this is true then seizure is a post_stroke (n=yes) but if seizure date DOES NOT comes after either stroke dates then the post_stroke (n=no).
@CathyVI wrote:
@Kurt_BremserThis is a typo...
In the line you comment out, I want the N to indicate if any seizure date comes before the first stroke dates (ischemic or hemorrhagic) if this is true then seizure is a post_stroke (n=yes) but if seizure date DOES NOT comes after either stroke dates then the post_stroke (n=no).
If it is before, you want yes, but if it is not after, you want no.
Note: "before" and "not after" indicate the same time frame in English.
Depending on whether or not you @CathyVI are correct or @Kurt_Bremser is correct, you want some variation of this command (note: no arrays needed)
if max (of r_seizure1-r_seizure4) < min (first_ischemic,first_herorrhagic) then n=1;
If that's not exactly what you want, then your homework assignment is to figure out what variation of the above code is correct for you.
This is not valid syntax:
if R_Seizure_1--R_Seizure_4=.
The = comparison operator can only compare 2 values.
Please explain in words what test you are trying to perform.
Do you want the result to be TRUE when ALL of the values are missing?
n(of R_Seizure_1--R_Seizure_4) = 0
Or any ANY of the values are missing?
nmiss(of R_Seizure_1--R_Seizure_4) > 0
Or even simpler just:
nmiss(of R_Seizure_1--R_Seizure_4)
since in boolean expressions SAS will treat 0 (or missing) as FALSE and any other number as TRUE.
Is the variable N supposed to be numeric or character? The code you have written creates it as character as first use is n='N/A'; Then you use statements that try to assign numeric values which gets you into the automatic conversion of numeric.
If you want to display 'N/A' for a numeric variable then assign a special missing and then create custom format to display that.
Below code populates variable FLG with values that match your variable N in the sample data.
Please add additional rows/data for use cases where you believe below code won't return what you desire. Given the already ongoing discussion I believe sample data with desired outcome that covers all your cases will get us faster to your desired solution.
data have;
input ID $ First_Ischemic :date9. First_Hemorrhagic :date9. R_Seizure_1 :date9. R_Seizure_2 :date9. R_Seizure_3 :date9. R_Seizure_4 :date9. n $;
format First_Ischemic First_Hemorrhagic R_Seizure_1 R_Seizure_2 R_Seizure_3 R_Seizure_4 date9.;
datalines;
11386 23SEP2004 10FEB2020 . . . 12AUG2015 no
11396 23SEP2004 10FEB2004 . . . 11FEB2020 yes
11427 11SEP2010 09AUG2010 10SEP2010 03FEB2012 . . yes
12666 . 18SEP2006 20JUN2002 . . . no
20485 15JUL2019 . . . 15AUG2009 25JUL2010 no
23434 . 18OCT2002 21JUN2003 . . . yes
32462 13AUG2014 . 12AUG2014 20JUN2002 . . no
34627 01DEC2009 30NOV2009 . 10FEB2010 . . yes
;
run;
proc format;
value yesno (default=3)
1='yes'
0='no'
other='n/a'
;
run;
data want;
set have;
length flg 3;
format flg yesno.;
flg = max(of R_Seizure_:) > max(First_Ischemic,First_Hemorrhagic);
run;
proc print data=want;
run;
An array does not really help with this problem since there is no need to loop.
Use the OF keyword to pass use a variable list when calling a function that takes an flexible number of arguments.
First let's add a few more examples to handle the missing values issues.
data have;
input ID $6. First_Ischemic First_Hemorrhagic R_Seizure_1-R_Seizure_3 ;
format First_Ischemic First_Hemorrhagic R_Seizure_1-R_Seizure_3 date9.;
informat First_Ischemic First_Hemorrhagic R_Seizure_1-R_Seizure_3 date.;
datalines;
011386 23SEP2004 10FEB2020 . . 12AUG2015
011396 23SEP2004 10FEB2004 . . 11FEB2020
011427 11SEP2010 09AUG2010 10SEP2010 03FEB2012 .
012666 . 18SEP2006 20JUN2002 . .
020485 15JUL2019 . . 15AUG2009 25JUL2010
023434 . 18OCT2002 21JUN2003 . .
032462 13AUG2014 . 12AUG2014 20JUN2002 .
034627 01DEC2009 30NOV2009 . 10FEB2010 .
555555 01JAN2024 . . . .
666666 . . 01JAN2024 . .
777777 . . . . .
888888 01JAN2024 . 01JAN2023 01MAY2024 .
;
Now let's create some flags to indicate if there are any seizure dates or stroke dates by using the N() function. We can then use MIN and/or MAX to check if ANY of the seizure dates were before the first stroke. Or if or any seizure dates where after the first stroke. You could also test if ALL of the seizure dates are after the first stroke.
proc format;
value ynu 0='No' 1='Yes' .='N/A';
run;
data want;
set have;
Any_Seizure = 0<N(of R_Seizure_1-R_Seizure_3);
Any_Stroke = 0<N(of First_Ischemic First_Hemorrhagic);
if Any_seizure and Any_Stroke then do;
Any_pre = min(of R_Seizure_1-R_Seizure_3) < min(of First_Ischemic First_Hemorrhagic);
All_pre = max(of R_Seizure_1-R_Seizure_3) < min(of First_Ischemic First_Hemorrhagic);
Any_post = max(of R_Seizure_1-R_Seizure_3) > min(of First_Ischemic First_Hemorrhagic);
All_post = min(of R_Seizure_1-R_Seizure_3) > min(of First_Ischemic First_Hemorrhagic);
end;
format any_: all_: ynu.;
run;
Results:
@Tom @PaigeMiller @Patrick @Kurt_Bremser @Astounding
THANK YOU ALL!!! I appreciate all your comments and guidance. All of you are right. If I could pick multiple solutions I would i pick all but SAS only allows one solution so I just pick the one I found simple to understand - knowing that i continue to learn SAS, am not an expert yet. Thank you all again.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.