Solved: Re: Scanning variable columns in search of specific values for f/u tim...

jsmall · Posted 04-03-2018 10:29 AM

Hi,

Below is longitudinal diabetes (y/n) data, I want to search for the first time a 0 or 1 appears so I can call that a baseline value. And also search for when a 1 appears so I know if the person developed diabetes. There are corresponding days2-days19 variables that are the f/u time (days from visit1) for each diabx field. So for example, obs=1242, I want to say baseline diabetes=0 occurred at the 3rd visit, and that a diabetes diagnosis was recorded at visit7, so total f/u time from baseline to diabetes diagnosis would be days7-days3. For now, we aren't worrying about the missing data between visits. I hope I explained this correctly. Thanks for any help!! Also, I don't use proc SQL very much so if it can be in a data step that would be awesome, otherwise if that's too complicated, SQL works

Obs	diab1	diab2	diab3	diab4	diab5	diab6	diab7	diab8	diab9	diab10	diab11	diab12	diab13	diab14	diab15	diab16	diab17	diab18	diab19
528	.	.	.	0	.	.	.	.	.	.	.	.	.	.	.	.	.	.	.
670	.	.	.	.	.	.	.	0	0	0	.	0	0	.	0	.	0	0	.
802	.	.	.	.	1	.	.	.	.	.	.	.	.	.	.	.	.	.	.
920	.	.	.	.	.	.	0	.	.	.	.	.	.	.	.	.	.	.	.
1148	.	.	.	0	0	0	.	.	.	.	.	.	.	.	.	.	.	.	.
1242	.	.	0	0	0	.	1	1	.	.	.	.	.	.	.	.	.	.	.
1327	.	.	0	0	0	0	0	0	0	0	0	0	0	.	0	.	1	.	.
1898	.	.	.	.	.	.	.	.	.	.	.	.	.	.	.	.	.	.	.
1981	.	.	.	.	.	.	.	.	.	.	.	.	.	.	.	.	.	.	.
2122	.	.	.	0	0	.	.	.	.	.	.	.	.	.	.	.	.	.	.
2124	.	.	.	.	.	.	.	.	.	.	.	.	.	.	.	.	.	.	.
2127	.	.	.	.	.	.	.	.	.	.	.	.	.	.	.	.	.	.	.
2146	.	.	.	.	.	.	.	0	0	0	0	0	.	1	0	1	1	.	.
2936	.	0	0	0	0	0	1	1	1	1	1	1	1	.	.	.	.	.	.
3059	.	.	.	.	.	.	1	.	.	.	.	.	.	.	.	.	.	.	.
3101	.	0	0	0	0	0	0	1	1	1	1	1	1	.	.	.	.	.	.
3192	.	.	.	.	.	.	0	0	0	0	.	0	0	0	.	.	0	.	.
3485	.	.	.	.	.	.	.	0	0	.	.	.	.	.	.	.	.	.	.
3760	.	0	.	0	1	1	1	1	1	.	.	.	.	.	.	.	.	.	.

ballardw · Posted 04-03-2018 12:20 PM

data have;
   infile datalines truncover;
   input obs  diab1-diab8 date1-date8;
   array db diab1-diab8;
   array dd date1-date8;
   first0 = whichn(0,of db(*));
   first1 = whichn(1,of db(*));
   if first0 then datefirst0 = dd[first0];
   if first1 then datefirst1 = dd[first1];
datalines;
1 . . 0 1 . . . . . . 2200 2923
2 . 0 . . 0 0 1 0 . 1496 3673 4429 5112 5962
3 . . 0 0 . . 0 1 . . 2195 2921 . . 5110 5803
4 . . 0 . . . . 0 . . 2200 . . . . 5784
5 . 0 0 1 1 1 1 1 . 1479 2200 5000 . . . .
6 . 0 0 0 0 0 . 0 . . 2168 2955 3702 . 5090 5769
7 . 0 0 0 1 1 1 1 . 1599 2199 2927 3634 4369 5118 5824
8 . . . 0 0 . 1 0 . . . 2912 3710 . 5117 5901
run;

I don't think that your date and diab variable align properly but this is an approach to finding which has the first 0 or 1 for the diab and the associated "date".

View solution in original post

Reeza · Posted 04-03-2018 10:46 AM

Use the WHICHN() function.

RW9 · Posted 04-03-2018 10:48 AM

It would make your life a lot easier if you worked with normalised data (data which goes down rather than across) rather than transposed data (data which goes across rather than down). It just makes most programming tasks far easier. As for your logic, not following it well, but maybe something on the lines of:

data want;
  set have;
  array diab{19};
  array days{19};
  do i=1 to 19;
    if diab{i} in 0,1 then do;
    ...;
  end;
run;

So array both sets of data which allows you to refer to groups easily.

Anyways, for more information post test data in the form of a datastep so we can run it, and show what the output should look like, not going to either type that in nor try guessing what the output should look like.

jsmall · Posted 04-03-2018 11:34 AM

data have;
set want;
input obs diab1-diab8 date2-date8;
datalines;
1 . . 0 1 . . . . . . 2200 2923
2 . 0 . . 0 0 1 0 . 1496 3673 4429 5112 5962
3 . . 0 0 . . 0 1 . . 2195 2921 . . 5110 5803
4 . . 0 . . . . 0 . . 2200 . . . . 5784
5 . 0 0 1 1 1 1 1 . 1479 2200 5000 . . . .
6 . 0 0 0 0 0 . 0 . . 2168 2955 3702 . 5090 5769
7 . 0 0 0 1 1 1 1 . 1599 2199 2927 3634 4369 5118 5824
8 . . . 0 0 . 1 0 . . . 2912 3710 . 5117 5901
run;

so I'm using a huge dataset, it's all in wide format. These are people who never had an official visit or diagnosis at baseline (diab1/date1) I'm was hoping to find a quick way to find out the first time (datex ) a person has a diabx of 0/1 so I can call that their baseline diagnosis. Then, if a person has a baseline diagnosis of 0 (no diabetes), and they develop diabetes afterwards, I want to find out at what time (datex ) that happened, so I can calculate their follow-up time (time to diagnosis) hope that helps. The whichn function may work?

RW9 · Posted 04-03-2018 12:10 PM

Yes, you can use arrays as I show in my previous post - am finishing for the day so can't elaborate. Why is the data not utilising a standard data structure, SDTM for instance. It would make your life so much easier to learn to work with normalised data, e.g.:

... DIAGDT DIAGYN ...

Then its simply a matter of sorting and filtering first date where DIAGYN=Y to get baseline. You can then merge this date back to the original data, and create a change value from baseline with a simple minus, a days since by another minus etc. It really is so much easier to work with.

ballardw · Posted 04-03-2018 12:20 PM

data have;
   infile datalines truncover;
   input obs  diab1-diab8 date1-date8;
   array db diab1-diab8;
   array dd date1-date8;
   first0 = whichn(0,of db(*));
   first1 = whichn(1,of db(*));
   if first0 then datefirst0 = dd[first0];
   if first1 then datefirst1 = dd[first1];
datalines;
1 . . 0 1 . . . . . . 2200 2923
2 . 0 . . 0 0 1 0 . 1496 3673 4429 5112 5962
3 . . 0 0 . . 0 1 . . 2195 2921 . . 5110 5803
4 . . 0 . . . . 0 . . 2200 . . . . 5784
5 . 0 0 1 1 1 1 1 . 1479 2200 5000 . . . .
6 . 0 0 0 0 0 . 0 . . 2168 2955 3702 . 5090 5769
7 . 0 0 0 1 1 1 1 . 1599 2199 2927 3634 4369 5118 5824
8 . . . 0 0 . 1 0 . . . 2912 3710 . 5117 5901
run;

I don't think that your date and diab variable align properly but this is an approach to finding which has the first 0 or 1 for the diab and the associated "date".

jsmall · Posted 04-03-2018 01:23 PM

thanks this works great!!

Scanning variable columns in search of specific values for f/u time

Re: Scanning variable columns in search of specific values for f/u time

Re: Scanning variable columns in search of specific values for f/u time

Re: Scanning variable columns in search of specific values for f/u time

Re: Scanning variable columns in search of specific values for f/u time

Re: Scanning variable columns in search of specific values for f/u time

Re: Scanning variable columns in search of specific values for f/u time

Re: Scanning variable columns in search of specific values for f/u time

Catch up on SAS Innovate 2026

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away