Hi,
Below is longitudinal diabetes (y/n) data, I want to search for the first time a 0 or 1 appears so I can call that a baseline value. And also search for when a 1 appears so I know if the person developed diabetes. There are corresponding days2-days19 variables that are the f/u time (days from visit1) for each diabx field. So for example, obs=1242, I want to say baseline diabetes=0 occurred at the 3rd visit, and that a diabetes diagnosis was recorded at visit7, so total f/u time from baseline to diabetes diagnosis would be days7-days3. For now, we aren't worrying about the missing data between visits. I hope I explained this correctly. Thanks for any help!! Also, I don't use proc SQL very much so if it can be in a data step that would be awesome, otherwise if that's too complicated, SQL works
Obs | diab1 | diab2 | diab3 | diab4 | diab5 | diab6 | diab7 | diab8 | diab9 | diab10 | diab11 | diab12 | diab13 | diab14 | diab15 | diab16 | diab17 | diab18 | diab19 |
528 | . | . | . | 0 | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . |
670 | . | . | . | . | . | . | . | 0 | 0 | 0 | . | 0 | 0 | . | 0 | . | 0 | 0 | . |
802 | . | . | . | . | 1 | . | . | . | . | . | . | . | . | . | . | . | . | . | . |
920 | . | . | . | . | . | . | 0 | . | . | . | . | . | . | . | . | . | . | . | . |
1148 | . | . | . | 0 | 0 | 0 | . | . | . | . | . | . | . | . | . | . | . | . | . |
1242 | . | . | 0 | 0 | 0 | . | 1 | 1 | . | . | . | . | . | . | . | . | . | . | . |
1327 | . | . | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | . | 0 | . | 1 | . | . |
1898 | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . |
1981 | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . |
2122 | . | . | . | 0 | 0 | . | . | . | . | . | . | . | . | . | . | . | . | . | . |
2124 | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . |
2127 | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . |
2146 | . | . | . | . | . | . | . | 0 | 0 | 0 | 0 | 0 | . | 1 | 0 | 1 | 1 | . | . |
2936 | . | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | . | . | . | . | . | . |
3059 | . | . | . | . | . | . | 1 | . | . | . | . | . | . | . | . | . | . | . | . |
3101 | . | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | . | . | . | . | . | . |
3192 | . | . | . | . | . | . | 0 | 0 | 0 | 0 | . | 0 | 0 | 0 | . | . | 0 | . | . |
3485 | . | . | . | . | . | . | . | 0 | 0 | . | . | . | . | . | . | . | . | . | . |
3760 | . | 0 | . | 0 | 1 | 1 | 1 | 1 | 1 | . | . | . | . | . | . | . | . | . | . |
data have; infile datalines truncover; input obs diab1-diab8 date1-date8; array db diab1-diab8; array dd date1-date8; first0 = whichn(0,of db(*)); first1 = whichn(1,of db(*)); if first0 then datefirst0 = dd[first0]; if first1 then datefirst1 = dd[first1]; datalines; 1 . . 0 1 . . . . . . 2200 2923 2 . 0 . . 0 0 1 0 . 1496 3673 4429 5112 5962 3 . . 0 0 . . 0 1 . . 2195 2921 . . 5110 5803 4 . . 0 . . . . 0 . . 2200 . . . . 5784 5 . 0 0 1 1 1 1 1 . 1479 2200 5000 . . . . 6 . 0 0 0 0 0 . 0 . . 2168 2955 3702 . 5090 5769 7 . 0 0 0 1 1 1 1 . 1599 2199 2927 3634 4369 5118 5824 8 . . . 0 0 . 1 0 . . . 2912 3710 . 5117 5901 run;
I don't think that your date and diab variable align properly but this is an approach to finding which has the first 0 or 1 for the diab and the associated "date".
Use the WHICHN() function.
It would make your life a lot easier if you worked with normalised data (data which goes down rather than across) rather than transposed data (data which goes across rather than down). It just makes most programming tasks far easier. As for your logic, not following it well, but maybe something on the lines of:
data want; set have; array diab{19}; array days{19}; do i=1 to 19; if diab{i} in 0,1 then do; ...; end; run;
So array both sets of data which allows you to refer to groups easily.
Anyways, for more information post test data in the form of a datastep so we can run it, and show what the output should look like, not going to either type that in nor try guessing what the output should look like.
data have;
set want;
input obs diab1-diab8 date2-date8;
datalines;
1 . . 0 1 . . . . . . 2200 2923
2 . 0 . . 0 0 1 0 . 1496 3673 4429 5112 5962
3 . . 0 0 . . 0 1 . . 2195 2921 . . 5110 5803
4 . . 0 . . . . 0 . . 2200 . . . . 5784
5 . 0 0 1 1 1 1 1 . 1479 2200 5000 . . . .
6 . 0 0 0 0 0 . 0 . . 2168 2955 3702 . 5090 5769
7 . 0 0 0 1 1 1 1 . 1599 2199 2927 3634 4369 5118 5824
8 . . . 0 0 . 1 0 . . . 2912 3710 . 5117 5901
run;
so I'm using a huge dataset, it's all in wide format. These are people who never had an official visit or diagnosis at baseline (diab1/date1) I'm was hoping to find a quick way to find out the first time (datex ) a person has a diabx of 0/1 so I can call that their baseline diagnosis. Then, if a person has a baseline diagnosis of 0 (no diabetes), and they develop diabetes afterwards, I want to find out at what time (datex ) that happened, so I can calculate their follow-up time (time to diagnosis) hope that helps. The whichn function may work?
Yes, you can use arrays as I show in my previous post - am finishing for the day so can't elaborate. Why is the data not utilising a standard data structure, SDTM for instance. It would make your life so much easier to learn to work with normalised data, e.g.:
... DIAGDT DIAGYN ...
Then its simply a matter of sorting and filtering first date where DIAGYN=Y to get baseline. You can then merge this date back to the original data, and create a change value from baseline with a simple minus, a days since by another minus etc. It really is so much easier to work with.
data have; infile datalines truncover; input obs diab1-diab8 date1-date8; array db diab1-diab8; array dd date1-date8; first0 = whichn(0,of db(*)); first1 = whichn(1,of db(*)); if first0 then datefirst0 = dd[first0]; if first1 then datefirst1 = dd[first1]; datalines; 1 . . 0 1 . . . . . . 2200 2923 2 . 0 . . 0 0 1 0 . 1496 3673 4429 5112 5962 3 . . 0 0 . . 0 1 . . 2195 2921 . . 5110 5803 4 . . 0 . . . . 0 . . 2200 . . . . 5784 5 . 0 0 1 1 1 1 1 . 1479 2200 5000 . . . . 6 . 0 0 0 0 0 . 0 . . 2168 2955 3702 . 5090 5769 7 . 0 0 0 1 1 1 1 . 1599 2199 2927 3634 4369 5118 5824 8 . . . 0 0 . 1 0 . . . 2912 3710 . 5117 5901 run;
I don't think that your date and diab variable align properly but this is an approach to finding which has the first 0 or 1 for the diab and the associated "date".
thanks this works great!!
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.