BookmarkSubscribeRSS Feed
lone0708
Fluorite | Level 6

Hello community, 

I hope you can help me with my programming.

I am looking to define time period for a diagnosis via proc sql. 

A participant can be registered several times during the follow-up period, but I want them only to appear in the time period if it is the first time they are assigned the diagnosis  (variable: in_date = date the diagnosis is assigned). 

I am working with the following time periods:

 

- before one year after first visit (only participants who is registered with the diagnosis before one year after first visit)

- between first and last visit (only participants with the first registration between first and last visit)

- after last visit (only participants with the first registration of diagnosis after last visit)

- any time point (should include everybody)

 

I have written the following proc sql, but the problem with this code is, that the participant can potentially appear in all of the groups, because he/she is registered several times during the whole follow-up period. How to specify that it has to be the first time, the diagnosis is assigned?

proc sql;

create table diagnosis as

select  distinct a.id, b.in_date, b.diagnosis, b.first_visit, b.last_visit, 

case when substr(b.diagnosis,2,4) in ("XX", "XX", "XX) and .<b.first_visit+365>=b.in_date then 1 else 0 end as diagnosis_1year_after_visit,

 

case when substr(b.diagnosis,2,4) in ("XX", "XX", "XX) and .<b.last_visit>b.in_date then 1 else 0 end as diagnosis_after_last_visit,

 

case when substr(b.diagnosis,2,4) in ("XX", "XX", "XX) and b.first_visit<b.in_date<last_visit then 1 else 0 end as diagnosis_between_visits,

 

case when  substr(b.diagnosis,2,4) in ("XX", "XX", "XX) then 1 else 0 end as diagnosis_ever

 

from test1 as b

left join id as a 

on a.id=b.id

group by iid

order by id;quit

fred12_0-1639168588499.png

 

4 REPLIES 4
ballardw
Super User

Which of those variables are supposed to be dates? Of those please show us what type, numeric or character, and the current SAS format that is applied. One suspects that you may not actually have date values so things like First_visit+365 may be returning basically nonsense values and comparisons with other stuff makes it worse.

 

https://communities.sas.com/t5/SAS-Communities-Library/Working-with-Dates-and-Times-in-SAS-Tutorial/... has a PDF with much information about dates.

 

The next thing is that "first time", which one would suspect is intended to represent something about order of processing is typically an obnoxious thing to do with SQL as it does not have any internal sense of record order processing. If you want "first" by time it is best to 1) use a sort by id variables and the variable that contains the date you want to use for determining order of processing and 2) Use a data step which processes records sequentially in an expected fashion. A BY statement allows you to determine when a new  group starts and do things conditionally, such as RETAIN a value as the "first".

 

Example (not involving dates by the concept may help). You should have the SASHELP.Class data set available to run this code.

proc sort data=sashelp.class out=work.class;
   by age sex name;
run;

data example;
   set work.class;
   by age sex;
   length firstagename firstagesexname $ 8;
   retain firstagename firstagesexname;
   if first.age then firstagename=name;
   if first.sex then firstagesexname=name;
run;

The BY statement allows testing values of Age and Sex to see if they are the first of a group of values using the First. (and a corresponding Last. automatic variables). These variables are numeric, 1/0 valued which SAS uses as True/False.

The above shows how to keep values of a variable using the Retain statement to set up variables not already in the data set to be kept across iterations of the data step code.

lone0708
Fluorite | Level 6
Thanks for your answer and sorry about the confusion with dates. In my real dataset it is a sas date ddmmyy date9. format.
The code as it is now works fine, only problem is that it counts the participant can appear in several groups, i want the participant only to be in the time period were he/she was first registered.
Tom
Super User Tom
Super User

It does not look like you have date values. Only the value for FIRST_VISIT looks like it might be a valid date value.

165   data have;
166     input number :comma.;
167     put number=comma12. number :date9. ;
168   cards;

number=21,019 19JUL2017
number=100,118 11FEB2234
number=150,821 07DEC2372
ballardw
Super User

My guess was

100118 = 10 Jan 2018

21019  =  2 Oct 2019

150821 = 15 Aug 2021

 

but as simple numbers. Which is why I asked about type and format.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 1083 views
  • 0 likes
  • 3 in conversation