Have:
- 4 variables for CD4 count (CD4_1, CD4_2, CD4_3 CD4_4)
- 4 variables for the date when the CD4 count was measured (CD4_date1, CD4_date2, CD4_date3, CD4_date4)
- 1 variable specifying the start date of the study period (start_sp)
- 1 variable specifying the end date of the study period (end_sp)
Want to create 3 new dichotomous variables:
- CD4_1000, indicates if the individual had a CD4 count less than 1000, with the date of the test falling between the start and end of the study period (start_sp and end_sp)
- CD4_500, indicates if the individual had a CD4 count less than 500, with the date of the test falling between the start and end of the study period (start_sp and end_sp)
- CD4_350, indicates if the individual had a CD4 count less than 350, with the date of the test falling between the start and end of the study period (start_sp and end_sp)
My question is how to create these 3 variables without a paragraph of if then's. This creates the 3 variables, but only incorporates the counts of the tests, not the dates:
array CD4 [*] CD4Count;
CD4_350 = 0;
CD4_500 = 0;
CD4_1000 = 0;
do i = 1 to dim(CD4);
if CD4[i] < 350 then CD4_350 = 1;
if CD4[i] < 500 then CD4_500 = 1;
if CD4[i] < 1000 CD4_1000 = 1;
end;
How can I modify this to include the requirement that the date of the test had to occur within the study period?
Some example input data and output for that is desireable.
Your code as shown is only going to process 1 variable comparison, that of CD4Count.
You may have meant
array cd cd4_1 - cd4_4 ; so that the base values are those four.
a second array would be needed to have the matching date
array d cd4_date1 - cd4_date4;
Your if statements would look something like
if cd[i] < (value) and (startdate le d[i] le enddate) then do ...
if the date isn't suppose to match the start/end to be "within period" then use lt or < insted of le.
HOWEVER you have a logic problem in that if the cd4_4 < 350 (or missing as missing is less than any value in SAS ) all of the resulting cd4_350, cd4_500 and cd4_1000 will all be true.
I am not sure what you want for the cd4_350 for instance as it may well change with each cd4 count variable.
Suppose Cd4_1=200, cd4_2=600, cd4_3=900, and cd4_4=1200. Then the first will set cd_350, cd4_500 and cd4_1000 all to 1.
there won't be any change in the cd4_350/500/1000 variables. This may be what you want if the interpretation is "at sometime within the study period at least one of the Cd4 counts was less than XXXX"
Like this?
do I = 1 to dim(CD4);
CD4_350 =( . < CD4[I] <= 350) * ( START_SP<CD4_DATE[I]<END_SP) * CD4_DATE[I];
CD4_500 =(350 < CD4[I] <= 500) * ( START_SP<CD4_DATE[I]<END_SP) * CD4_DATE[I];
CD4_1000=(500 < CD4[I] <= 1000) * ( START_SP<CD4_DATE[I]<END_SP) * CD4_DATE[I];
end;
This will store the test date provided the value is within the range and the date is with the date boundaries too.
Otherwise it will store zero.
The elements in the parentheses are tests and resolve to either 0 or 1.
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.