<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Counting large groups of variables within time frame in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Counting-large-groups-of-variables-within-time-frame/m-p/606440#M176105</link>
    <description>&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/31461"&gt;@mkeintz&lt;/a&gt;&amp;nbsp;I would love to hear your thoughts.&amp;nbsp;I would like the output data set to have a single row per each patient with a new variable (e.g., newvar) indicating whether the condition is met (diagnosis date of A or B or C - diagnosis date of D or E or F = le/ge 365 days) for every&amp;nbsp; patient in the dataset. Below is an example based on the example data set above.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;TABLE&gt;&lt;TBODY&gt;&lt;TR&gt;&lt;TD&gt;&lt;P&gt;pid&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P&gt;newvar&lt;/P&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;&lt;P&gt;1&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P&gt;1&lt;/P&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;&lt;P&gt;2&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P&gt;1&lt;/P&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;&lt;P&gt;3&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P&gt;1&lt;/P&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;&lt;P&gt;4&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P&gt;1&lt;/P&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;&lt;P&gt;5&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P&gt;0&lt;/P&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;&lt;P&gt;6&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P&gt;1&lt;/P&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;&lt;P&gt;7&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P&gt;0&lt;/P&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;</description>
    <pubDate>Fri, 22 Nov 2019 13:11:04 GMT</pubDate>
    <dc:creator>wj2</dc:creator>
    <dc:date>2019-11-22T13:11:04Z</dc:date>
    <item>
      <title>Counting large groups of variables within time frame</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Counting-large-groups-of-variables-within-time-frame/m-p/606373#M176072</link>
      <description>&lt;P&gt;Hello Community,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I am working with a large electronic health record data set (~1 million patients) and I need to create a variable that indicates whether patients have ANY diagnosis in a group of diagnoses (e.g., A, B, or C) within 365 days (before or after) the date of ANY diagnosis within a different group of diagnoses (e.g., D, E, or F).&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;In other words: (diagnosis date of diagnosis A or B or C) – (diagnosis date of D or E or F) = le 365 days or ge 365 days. The condition can be met for any pair of diagnoses between the two groups (e.g., D-A, D-B, D-C, E-A, etc.).&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Another critical issue is that one of the groups of diagnoses that I am interested in contains approximately 2,500 different diagnoses and corresponding diagnosis dates. Therefore, I need the program to be able to somehow incorporate this large amount of data in an efficient manner in order to limit CPU resources (maybe via creating a separate data set for these diagnoses to read in?).&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Below is a simplified example data set with some different cases for when the criterion should and should not be met. patient id = pid; diagnosis code= diag_cd; diagnosis date=diag_dt.&lt;/P&gt;&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data work.test;
length pid $1.;
length diag $1.;
length diag_dt 8.; format diag_dt yymmdd10.;
infile datalines ;
input pid $ diag $ diag_dt date11.;
datalines;
1 A 10JAN2013
1 D 10JAN2013 /*condition met because diagnosis D occurs on same date as A*/
1 G 10JAN2013
2 F 02JUN2017 
2 J 02JUN2017
2 B 01JUL2017 /*condition met because diagnosis dates of F and B are within 365 days*/
2 J 01JUL2017
3 E 23OCT2012
3 C 17DEC2014
3 X 11JAN2015
3 D 25FEB2015 /*condition met because D occurs within 365 days of C*/
4 A 12MAR2015
4 Z 12MAR2015
4 A 20FEB2016 
4 E 07JAN2016 /*condition met because E occurs within 365 days of A on 12MAR2015*/
5 C 16JAN2017
5 A 28JAN2017
5 F 01FEB2018 /*condition NOT met because F occurs after 365 days from C and A*/
6 B 13NOV2016
6 B 10DEC2016
6 D 10DEC2016 /*condition met because D occurs within 365 days of B on 13NOV2016 or 10DEC2016*/
6 B 12DEC2017
7 F 01JAN2013
7 D 08JUN2013
7 C 07OCT2014 /*condition not met because C occurs after 365 days from F and D*/
;
run;&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;Ultimately, I would like the output data set to have a single row per each patient with a new variable (e.g., newvar) indicating whether the condition is met (diagnosis date of A or B or C - diagnosis date of D or E or F = le/ge 365 days). Below is an example based on the example data set above.&lt;/P&gt;&lt;TABLE&gt;&lt;TBODY&gt;&lt;TR&gt;&lt;TD&gt;&lt;P&gt;pid&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P&gt;newvar&lt;/P&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;&lt;P&gt;1&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P&gt;1&lt;/P&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;&lt;P&gt;2&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P&gt;1&lt;/P&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;&lt;P&gt;3&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P&gt;1&lt;/P&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;&lt;P&gt;4&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P&gt;1&lt;/P&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;&lt;P&gt;5&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P&gt;0&lt;/P&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;&lt;P&gt;6&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P&gt;1&lt;/P&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;&lt;P&gt;7&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P&gt;0&lt;/P&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Any help with this problem would be much appreciated. Please let me know if I can provide any additional clarification or information. Thank you!&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 22 Nov 2019 04:10:19 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Counting-large-groups-of-variables-within-time-frame/m-p/606373#M176072</guid>
      <dc:creator>wj2</dc:creator>
      <dc:date>2019-11-22T04:10:19Z</dc:date>
    </item>
    <item>
      <title>Re: Counting large groups of variables within time frame</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Counting-large-groups-of-variables-within-time-frame/m-p/606376#M176074</link>
      <description>If you use SQL to first filter your data set to just those set of diagnosis how many records do you have?&lt;BR /&gt;&lt;BR /&gt;proc sql;&lt;BR /&gt;create table simplified as&lt;BR /&gt;select * from table1 where diag in (select diag from table2);&lt;BR /&gt;quit;</description>
      <pubDate>Fri, 22 Nov 2019 04:24:38 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Counting-large-groups-of-variables-within-time-frame/m-p/606376#M176074</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2019-11-22T04:24:38Z</dc:date>
    </item>
    <item>
      <title>Re: Counting large groups of variables within time frame</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Counting-large-groups-of-variables-within-time-frame/m-p/606383#M176077</link>
      <description>&lt;P&gt;I can imagine a couple ways to implement the tests you want to do, but what to you want your output&amp;nbsp; data set to look like?&lt;/P&gt;</description>
      <pubDate>Fri, 22 Nov 2019 06:05:57 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Counting-large-groups-of-variables-within-time-frame/m-p/606383#M176077</guid>
      <dc:creator>mkeintz</dc:creator>
      <dc:date>2019-11-22T06:05:57Z</dc:date>
    </item>
    <item>
      <title>Re: Counting large groups of variables within time frame</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Counting-large-groups-of-variables-within-time-frame/m-p/606440#M176105</link>
      <description>&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/31461"&gt;@mkeintz&lt;/a&gt;&amp;nbsp;I would love to hear your thoughts.&amp;nbsp;I would like the output data set to have a single row per each patient with a new variable (e.g., newvar) indicating whether the condition is met (diagnosis date of A or B or C - diagnosis date of D or E or F = le/ge 365 days) for every&amp;nbsp; patient in the dataset. Below is an example based on the example data set above.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;TABLE&gt;&lt;TBODY&gt;&lt;TR&gt;&lt;TD&gt;&lt;P&gt;pid&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P&gt;newvar&lt;/P&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;&lt;P&gt;1&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P&gt;1&lt;/P&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;&lt;P&gt;2&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P&gt;1&lt;/P&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;&lt;P&gt;3&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P&gt;1&lt;/P&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;&lt;P&gt;4&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P&gt;1&lt;/P&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;&lt;P&gt;5&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P&gt;0&lt;/P&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;&lt;P&gt;6&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P&gt;1&lt;/P&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;&lt;P&gt;7&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P&gt;0&lt;/P&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;</description>
      <pubDate>Fri, 22 Nov 2019 13:11:04 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Counting-large-groups-of-variables-within-time-frame/m-p/606440#M176105</guid>
      <dc:creator>wj2</dc:creator>
      <dc:date>2019-11-22T13:11:04Z</dc:date>
    </item>
    <item>
      <title>Re: Counting large groups of variables within time frame</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Counting-large-groups-of-variables-within-time-frame/m-p/607343#M176524</link>
      <description>&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/13879"&gt;@Reeza&lt;/a&gt;&amp;nbsp;after filtering, it looks like there are 35,324 records for the first set of diagnoses and&amp;nbsp;9,576,319 records for the second set of diagnoses. Any suggestions would be much appreciated.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 26 Nov 2019 13:56:07 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Counting-large-groups-of-variables-within-time-frame/m-p/607343#M176524</guid>
      <dc:creator>wj2</dc:creator>
      <dc:date>2019-11-26T13:56:07Z</dc:date>
    </item>
    <item>
      <title>Re: Counting large groups of variables within time frame</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Counting-large-groups-of-variables-within-time-frame/m-p/607406#M176577</link>
      <description>You need to post one of each of your input data sets. But honestly this question gets asked a lot on here there's many solutions. Usually the most efficient is a hash solution but I don't code in hash &lt;span class="lia-unicode-emoji" title=":disappointed_face:"&gt;😞&lt;/span&gt;&lt;BR /&gt;&lt;BR /&gt;The second option, which is pretty simple is a SQL join.  &lt;BR /&gt;&lt;BR /&gt;It would look something like this assuming you've filtered each data set for what you need. Modify this for what your data and if you have issues let us know.&lt;BR /&gt;&lt;BR /&gt;proc sql;&lt;BR /&gt;create table want as&lt;BR /&gt;select a.*, b.diag as diag_second, b.date as date_second_event, b.diag_date - a.diag_date as date_diff&lt;BR /&gt;from have1 as a&lt;BR /&gt;left join have2 as b&lt;BR /&gt;on a.patientid=b.patientid&lt;BR /&gt;&lt;BR /&gt;where b.date-a.date &amp;lt; 365&lt;BR /&gt;order by a.pid, a.diag_date, b.diag_date;&lt;BR /&gt;quit;&lt;BR /&gt;&lt;BR /&gt;You may end up with multiple matches here - ie if you have 2 diagnosis within 365 days it will list all three.</description>
      <pubDate>Tue, 26 Nov 2019 16:13:36 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Counting-large-groups-of-variables-within-time-frame/m-p/607406#M176577</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2019-11-26T16:13:36Z</dc:date>
    </item>
    <item>
      <title>Re: Counting large groups of variables within time frame</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Counting-large-groups-of-variables-within-time-frame/m-p/607476#M176616</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/115194"&gt;@wj2&lt;/a&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;This can be done with a simple data step, where each diagnosis_date in a group (ABC) or DEF) is compared to the last previous diagnosis_date in the other group. No need for preprocessing, lag functions or hash.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;The only reguirement is that the input data set is sorted on pid and diag_dt, so I have included that in the code as a precaution in case your real data set is unsorted.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I only one group is present for a given pid, newvar will be set to 0. I hope this vill solve your problem.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;proc sort data=test;&lt;BR /&gt;        by pid diag_dt;&lt;BR /&gt;run;&lt;BR /&gt;
data want (keep=pid newvar); 
	set test;
	by pid;
	retain lastABC lastDEF newvar;&lt;BR /&gt;
	if first.pid then do;
		lastABC = 0; lastDEF = 0; newvar = 0;
	end;&lt;BR /&gt;
	if diag in ('A','B','C') then do;
		if lastDEF &amp;gt; 0 then newvar = sum(newvar,(diag_dt-lastDEF) &amp;lt;= 365);
		lastABC = diag_dt;
	end;&lt;BR /&gt;
	else if diag in ('D','E','F') then do;
		if lastABC &amp;gt; 0 then newvar = sum(newvar,(diag_dt-lastABC) &amp;lt;= 365);
		lastDEF = diag_dt;
	end;&lt;BR /&gt;
	if last.pid then do;
		newvar = (newvar &amp;gt; 0);
		output;
	end;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 26 Nov 2019 19:37:13 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Counting-large-groups-of-variables-within-time-frame/m-p/607476#M176616</guid>
      <dc:creator>ErikLund_Jensen</dc:creator>
      <dc:date>2019-11-26T19:37:13Z</dc:date>
    </item>
    <item>
      <title>Re: Counting large groups of variables within time frame</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Counting-large-groups-of-variables-within-time-frame/m-p/607700#M176713</link>
      <description>&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/13879"&gt;@Reeza&lt;/a&gt;&amp;nbsp;thank you! In the past I have had disk space problems with hash solutions so I appreciate the SQL approach.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I would actually like to have the newvar indicate whether the condition is met among the entire sample. That is, newvar=0 for those where the condition is not met. Would you mind suggesting how the code could be modified for that? Also, how would I indicate the two sets of diagnoses (e.g., set 1 containing diagnoses A, B, and C and set 2 containing diagnoses D, E, and F) that I am interested in finding the difference between diagnosis dates? Would an INTCK function be necessary to handle negative values in the difference between dates? Thanks again!&lt;/P&gt;</description>
      <pubDate>Wed, 27 Nov 2019 14:40:19 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Counting-large-groups-of-variables-within-time-frame/m-p/607700#M176713</guid>
      <dc:creator>wj2</dc:creator>
      <dc:date>2019-11-27T14:40:19Z</dc:date>
    </item>
    <item>
      <title>Re: Counting large groups of variables within time frame</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Counting-large-groups-of-variables-within-time-frame/m-p/607703#M176715</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/115194"&gt;@wj2&lt;/a&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;You might give my code a try. It is intended to be very efficient (much more than any SQL approach with between-conditions) and return your wanted-data set in the final form with 1/0 covering the entire sample. It should work on your test data without modifications.&lt;/P&gt;</description>
      <pubDate>Wed, 27 Nov 2019 14:46:44 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Counting-large-groups-of-variables-within-time-frame/m-p/607703#M176715</guid>
      <dc:creator>ErikLund_Jensen</dc:creator>
      <dc:date>2019-11-27T14:46:44Z</dc:date>
    </item>
    <item>
      <title>Re: Counting large groups of variables within time frame</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Counting-large-groups-of-variables-within-time-frame/m-p/607708#M176719</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/12887"&gt;@ErikLund_Jensen&lt;/a&gt;&amp;nbsp;Yes, thank you! Your approach appears to be exactly what I need but I will test it to be sure today. As a new SAS learner, I am just interested in learning multiple approaches so I was wondering about a SQL approach as well!&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 27 Nov 2019 14:51:20 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Counting-large-groups-of-variables-within-time-frame/m-p/607708#M176719</guid>
      <dc:creator>wj2</dc:creator>
      <dc:date>2019-11-27T14:51:20Z</dc:date>
    </item>
  </channel>
</rss>

