BookmarkSubscribeRSS Feed
lstockman
Fluorite | Level 6

Hello - I have a dataset of test dates for each record in sequential order, earliest to latest. Some records have data in up to 300 columns (i.e. test_date1...test_date300), some have only one. I need to add a new variable 'new_var' that indicates if there are two test_dates within a 6 month timespan.

 

Does anyone have suggestions of an array that would calculate the span of time between the first test_date and the last test_date and then count if there are two dates, if there are new_var='Y'.

 

Some sample data if useful.

 

data labs;

input ID $ test_date1 test_date2 test_date3 test_date4 test_date5;

datalines;

101 08192010 . . . . .

102 01192010 01192010 11272014 07262016 09272016

103 02192011 02192013 12272014 . .

;

run;

 

 

Thank you!

5 REPLIES 5
VDD
Ammonite | Level 13 VDD
Ammonite | Level 13

in you example you have ID 101 with only 1 date, but @lstockman wrote I could have 300 dates. 

Will there be missing dates in the possible arrayed date-columns for an ID where date1 is not null but maybe date2-299 are null and then date300 has a date value?

is the first found date1-300 in the array the start-date and the last date found the records to be evaluated?

 

Can you please provide a few more datalines representing what you may have coming in.

Please post a sample of your wants related to the datalines you post.

Thank you.

 

lstockman
Fluorite | Level 6
Hi! There are no missing or null values within completed values in test_date. So if there are three tests, then test_date1, test_date2, and test_date3 will all have values.

I don’t have a good starting point for the 6-month period, this is my problem. If test_date1 is in 2012 but the test_date2 and test_date3 both occur in 2014 and 6 months or less apart, I want to flag the record as meeting the criteria.
VDD
Ammonite | Level 13 VDD
Ammonite | Level 13

this link may assist you with your wants

http://support.sas.com/documentation/cdl/en/etsug/66100/HTML/default/viewer.htm#etsug_tsdata_sect057...

 

you might want to place a do loop in the code to advance the lag date if you want to move the starting date value from date1 to date2... date299.

 

VDD
Ammonite | Level 13 VDD
Ammonite | Level 13

replace the 5 with 300 if you have 300 columns and the 4 with your number of columns -1.  When new_var{I} = 1 then the dates being compared are 6 months apart.

 

data labs;
input ID $ test_date1: mmddyy8. test_date2: mmddyy8. test_date3: mmddyy8. test_date4: mmddyy8. test_date5: mmddyy8.;
datalines; 
101 08192010 . . . . .
102 01192010 01192010 11272014 07262016 09272016
103 02192011 02192013 12272014 . .
104 01012010 04012010 07012010 01052013 .
105 01012010 07102010 07012010 01052011 .
;
run;
data want;
	array test_date(5);
	array tmonth(5);
	array new_var(5);
	set labs;
	do j = 1 to 4;
		do i = j+1 to 5;
			tmonth{i} = (test_date{i}-test_date{j})/182.5;
			if tmonth{i} >= 1 then 
				do;
				new_var{i} = 1;
				end;
			if i = 5  and test_date{j} ne . then output;
		end;
	end;
run;

 

 

 

lstockman
Fluorite | Level 6

Thank you. Unfortunately, the tmonth calculation is not working quite right. I would like the tmonth calculation to be the difference in months between each test_date and the previous test_date. Then if tmonth <=6 months then new_var=1. Does anyone know a modification? Also the code provided is outputting multiple rows for each ID.

 

 

 

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 1730 views
  • 0 likes
  • 2 in conversation