Hello community,
I hope you can help me with my programming.
I am looking to define time period for a diagnosis via proc sql.
A participant can be registered several times during the follow-up period, but I want them only to appear in the time period if it is the first time they are assigned the diagnosis (variable: in_date = date the diagnosis is assigned).
I am working with the following time periods:
- before one year after first visit (only participants who is registered with the diagnosis before one year after first visit)
- between first and last visit (only participants with the first registration between first and last visit)
- after last visit (only participants with the first registration of diagnosis after last visit)
- any time point (should include everybody)
I have written the following proc sql, but the problem with this code is, that the participant can potentially appear in all of the groups, because he/she is registered several times during the whole follow-up period. How to specify that it has to be the first time, the diagnosis is assigned?
proc sql;
create table diagnosis as
select distinct a.id, b.in_date, b.diagnosis, b.first_visit, b.last_visit,
case when substr(b.diagnosis,2,4) in ("XX", "XX", "XX) and .<b.first_visit+365>=b.in_date then 1 else 0 end as diagnosis_1year_after_visit,
case when substr(b.diagnosis,2,4) in ("XX", "XX", "XX) and .<b.last_visit>b.in_date then 1 else 0 end as diagnosis_after_last_visit,
case when substr(b.diagnosis,2,4) in ("XX", "XX", "XX) and b.first_visit<b.in_date<last_visit then 1 else 0 end as diagnosis_between_visits,
case when substr(b.diagnosis,2,4) in ("XX", "XX", "XX) then 1 else 0 end as diagnosis_ever
from test1 as b
left join id as a
on a.id=b.id
group by iid
order by id;quit
Which of those variables are supposed to be dates? Of those please show us what type, numeric or character, and the current SAS format that is applied. One suspects that you may not actually have date values so things like First_visit+365 may be returning basically nonsense values and comparisons with other stuff makes it worse.
https://communities.sas.com/t5/SAS-Communities-Library/Working-with-Dates-and-Times-in-SAS-Tutorial/... has a PDF with much information about dates.
The next thing is that "first time", which one would suspect is intended to represent something about order of processing is typically an obnoxious thing to do with SQL as it does not have any internal sense of record order processing. If you want "first" by time it is best to 1) use a sort by id variables and the variable that contains the date you want to use for determining order of processing and 2) Use a data step which processes records sequentially in an expected fashion. A BY statement allows you to determine when a new group starts and do things conditionally, such as RETAIN a value as the "first".
Example (not involving dates by the concept may help). You should have the SASHELP.Class data set available to run this code.
proc sort data=sashelp.class out=work.class; by age sex name; run; data example; set work.class; by age sex; length firstagename firstagesexname $ 8; retain firstagename firstagesexname; if first.age then firstagename=name; if first.sex then firstagesexname=name; run;
The BY statement allows testing values of Age and Sex to see if they are the first of a group of values using the First. (and a corresponding Last. automatic variables). These variables are numeric, 1/0 valued which SAS uses as True/False.
The above shows how to keep values of a variable using the Retain statement to set up variables not already in the data set to be kept across iterations of the data step code.
It does not look like you have date values. Only the value for FIRST_VISIT looks like it might be a valid date value.
165 data have; 166 input number :comma.; 167 put number=comma12. number :date9. ; 168 cards; number=21,019 19JUL2017 number=100,118 11FEB2234 number=150,821 07DEC2372
My guess was
100118 = 10 Jan 2018
21019 = 2 Oct 2019
150821 = 15 Aug 2021
but as simple numbers. Which is why I asked about type and format.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.