BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.

Hello,

 

I have previously posted regarding this topic here. The solution worked, but the code required about 8 hours to run and complete. @PaigeMiller and @FreelanceReinh recommended I post a separate topic inquiring about hash objects, which ostensibly should take much longer to run. Below is a description of what I need:

 

I have two datasets: One sleep dataset and one survey dataset. Each dataset is long form, with ID numbers repeating by row. Each ID has several months of data in each dataset.

 

For the sleep dataset, each row represents one night of sleep data, and each variable is represented in a separate column. Below is code for sample data from the sleep dataset:

 

Data sleep;
Input id wake_date date12. Sleep_dur_hrs Sleep_start_MC_hrs;
datalines;
1	14Nov2011	7.5167	-1.25
1	15Nov2011	5.4333	-1.46667
1	16Nov2011	8.4	-5.45833
1	17Nov2011	7.5167	-1.53333
1	18Nov2011	7.7	-1.46667
1	19Nov2011	7.6	-1.51667
1	20Nov2011	7.4	-1.81667
1	21Nov2011	8.2333	-4.89167
1	22Nov2011	6.75	-1.56667
1	23Nov2011	7.2	-1.1
1	24Nov2011	4.15	2.61667
1	25Nov2011	8.1	-1.38333
1	26Nov2011	7.7667	-1.49167
1	27Nov2011	7.4833	-1.06667
1	28Nov2011	7.7333	-0.88333
1	29Nov2011	7.0333	0.49167
1	30Nov2011	7.45	-1.71667
1	01Dec2011	7.6167	-1.45
1	02Dec2011	6.6167	-1.73333
1	03Dec2011	7.6667	-1.34167
1	04Dec2011	6.9167	-3.825
1	05Dec2011	7.2667	-1.4
1	06Dec2011	6.4	-1.26667
1	07Dec2011	8.6	-1.725
1	08Dec2011	7.1833	-1.275
1	09Dec2011	7.1833	-0.51667
1	10Dec2011	8.0333	-1.61667
1	11Dec2011	6.7333	-0.78333
1	12Dec2011	8.4333	-2.58333
1	13Dec2011	7.65	-2
1	14Dec2011	5.9167	-2.71667
1	15Dec2011	9.1167	-2.775
1	16Dec2011	7.5	-1.54167
1	17Dec2011	7.7667	-2.19167
1	18Dec2011	6.8167	-1.93333
1	19Dec2011	7.7667	-1.56667
1	20Dec2011	7.5	-2.2
1	21Dec2011	8.7667	-2.49167
1	22Dec2011	7.8667	-2.66667
1	23Dec2011	6.2833	-0.01667
1	24Dec2011	8.4	-1.26667
1	25Dec2011	4.9833	-1.18333
1	26Dec2011	7.6333	-1.775
1	27Dec2011	4.0333	1.85
1	28Dec2011	7.5667	-2.21667
1	29Dec2011	2.7	3.26667
1	30Dec2011	8.35	-5.26667
1	31Dec2011	8.1	-1.16667
1	01Jan2012	5.8333	-0.98333
1	02Jan2012	6.9333	-2.40833
1	03Jan2012	6.3833	-2.09444
1	04Jan2012	8.4667	-2.61667
1	05Jan2012	7.45	-1.21667
1	06Jan2012	7.5167	-0.76667
1	07Jan2012	7.0667	-1.15
1	08Jan2012	8.45	-2.43333
1	09Jan2012	7.4667	-0.91667
1	10Jan2012	7.3	-1.03333
1	11Jan2012	8.6667	-2.76667
1	12Jan2012	8.1	-0.33333
1	13Jan2012	5.3333	1.3
1	14Jan2012	7.9333	-1.1
1	15Jan2012	6.35	-3.01389
1	16Jan2012	7.4333	-1.03333
1	17Jan2012	7.3833	-1.55
1	18Jan2012	8.5	-1.35
1	19Jan2012	4.7333	-0.15
1	20Jan2012	7.3667	-1.28333
1	21Jan2012	8.5	-1.36667
1	22Jan2012	7.9	-0.76667
1	23Jan2012	7.8	-0.48333
1	24Jan2012	8.1667	-1.1
1	25Jan2012	7.95	-1.45
1	26Jan2012	7.7333	-1.25
1	27Jan2012	7.3833	-1.05833
1	28Jan2012	7.95	-0.88333
1	29Jan2012	7.3833	-0.98333
1	30Jan2012	5.4833	2.0125
1	31Jan2012	6.6167	-0.13333
1	01Feb2012	7.0667	1.40417
1	02Feb2012	6.7	-0.55833
1	03Feb2012	9.1333	-1.575
1	04Feb2012	7.5167	-1.69167
1	05Feb2012	7.7833	-1.98333
1	06Feb2012	9.65	-1.875
1	07Feb2012	4.0833	0.9
1	08Feb2012	7.2333	-1.58333
1	09Feb2012	7.9167	-0.85
1	10Feb2012	8.1333	-1.65
1	11Feb2012	8.0333	-2.36667
1	12Feb2012	6.8667	-0.66667
1	13Feb2012	7.75	-2.04167
1	14Feb2012	5.6833	-2.10833
1	15Feb2012	5.65	-0.23333
1	16Feb2012	7.9667	-0.53333
1	17Feb2012	7.7167	-1.38333
1	18Feb2012	8.45	-1.55
1	19Feb2012	6.2333	1.78333
1	20Feb2012	8.2167	-0.93333
1	21Feb2012	8.3167	-1.75
1	22Feb2012	8.6	-2.3
1	23Feb2012	8.1667	-1.51667
1	24Feb2012	7.7833	-1.1
1	25Feb2012	7.6167	-0.85
1	26Feb2012	6.6	-1.45
1	27Feb2012	6.1333	-1.13333
1	28Feb2012	8.55	-3.7
1	29Feb2012	8.0667	-2.39167
1	01Mar2012	8.15	-1.23333
1	02Mar2012	8.75	-1.41667
1	03Mar2012	8.1833	-0.79167
1	04Mar2012	8.3333	-1.75833
1	05Mar2012	7.7333	-1.16667
1	06Mar2012	6.3333	0.05
1	07Mar2012	7.9167	-1.51667
1	08Mar2012	7.2	-1.25833
1	09Mar2012	6	2.25833
1	10Mar2012	8.1667	-1.175
1	11Mar2012	6.5833	-0.68333
1	12Mar2012	6.5667	1.575
1	13Mar2012	7.65	-0.25
1	14Mar2012	7.1333	-0.24167
1	15Mar2012	7.3167	-2.475
1	16Mar2012	8.55	-2
1	17Mar2012	6.2333	-0.28333
1	18Mar2012	8.1667	-1.26667
1	19Mar2012	8.7333	-1.68333
1	20Mar2012	7.8167	-1.15833
1	21Mar2012	7.7167	-0.73333
1	22Mar2012	7.7167	-0.43333
1	23Mar2012	7.85	-0.48333
1	24Mar2012	3.9	0.71667
1	25Mar2012	7.8167	-2.1
1	26Mar2012	7.5333	1.54167
1	27Mar2012	7.9167	-1.05
1	28Mar2012	8.8667	-1.35
1	29Mar2012	9.3667	-0.58333
1	30Mar2012	5.1	1.875
1	31Mar2012	7.35	-0.33333
1	01Apr2012	6.5333	-1.26667
1	02Apr2012	6.7	-1.13333
1	03Apr2012	9	-0.53333
1	04Apr2012	7.2	-1.81667
1	05Apr2012	8.1833	3.73333
1	06Apr2012	8.65	-1.23333
1	07Apr2012	7.1667	0.44167
1	08Apr2012	7.5833	-1.21667
2	14Nov2011	7.45	-2.23333
2	15Nov2011	7.3	-2.11667
2	16Nov2011	6.0833	-1.7
2	17Nov2011	6.3333	-2.35
2	18Nov2011	5.3167	-1.14167
2	19Nov2011	6.9	-1.025
2	20Nov2011	6.25	-1.50833
2	21Nov2011	5.2667	1.675
2	22Nov2011	5.9667	-1.775
2	23Nov2011	5.9	-2.98333
2	24Nov2011	6.1167	0.33333
2	25Nov2011	5.3667	1.55833
2	26Nov2011	8.4667	-3.2
2	27Nov2011	5.5667	-1.5
2	28Nov2011	7.05	1.61667
2	29Nov2011	6.1667	-0.65833
2	30Nov2011	5.95	2.0375
2	01Dec2011	6.15	-0.98333
2	02Dec2011	4.6833	-0.55833
2	03Dec2011	3.7167	-0.45
2	04Dec2011	6.1333	-0.7
2	05Dec2011	4.9833	-0.45
2	06Dec2011	7.8833	-1.38333
2	07Dec2011	6.2	0.95
2	08Dec2011	6.4167	0.18333
2	09Dec2011	6.3167	1.25
2	10Dec2011	4.6167	0.86667
2	11Dec2011	3.65	2.56667
2	12Dec2011	7.15	-0.21667
2	13Dec2011	6.8	0.2
2	14Dec2011	2.9667	4.40833
2	15Dec2011	5.5167	2.3
2	16Dec2011	8.25	-0.89167
2	17Dec2011	3.55	1.8
2	18Dec2011	7.0167	0.25833
2	19Dec2011	5.8	2.84583
2	20Dec2011	6.1333	-2.51667
2	21Dec2011	5.2333	-1.425
2	22Dec2011	7.4833	-0.825
2	23Dec2011	6.3667	0.24167
2	24Dec2011	4.6667	-0.35
2	25Dec2011	6.4333	-0.55
2	26Dec2011	6.6667	-0.93333
2	27Dec2011	6.3	-0.88333
2	28Dec2011	5.15	-0.025
2	29Dec2011	4.6833	2.67083
2	30Dec2011	8.1	-2.4
2	31Dec2011	4.4333	-3.80417
2	01Jan2012	7.05	-1.28333
2	02Jan2012	6.7	-0.29167
2	03Jan2012	5.6833	3.17083
2	04Jan2012	6.55	2.34583
2	05Jan2012	6.7667	-0.33056
2	06Jan2012	6.0667	0.64167
2	07Jan2012	5.9667	-0.85833
2	08Jan2012	7.3167	0.175
2	09Jan2012	5.2667	-0.61667
2	10Jan2012	5.7667	1.37083
2	11Jan2012	5.8833	3.27917
2	12Jan2012	4.7167	1.23333
2	13Jan2012	7.4333	1.16667
2	14Jan2012	3.95	-2.275
2	15Jan2012	7.35	-1.25
2	16Jan2012	6.3333	-0.10833
2	17Jan2012	3.9333	0.525
2	18Jan2012	2.9167	4.25
2	19Jan2012	6.15	1.21667
2	20Jan2012	7.2	0.58333
2	21Jan2012	4.5833	1.01667
2	22Jan2012	7.1	0.24167
2	23Jan2012	5.8667	2.40417
2	24Jan2012	6.1667	2.2
2	25Jan2012	6.5	0.95
2	26Jan2012	6.1833	1.19167
2	27Jan2012	7.55	-1.075
2	28Jan2012	4.8833	-0.19167
2	29Jan2012	8.5833	-0.76667
2	30Jan2012	4.95	2.44167
2	31Jan2012	4.05	0.75
2	01Feb2012	8.55	-2.2
2	02Feb2012	7.5	-0.14167
2	03Feb2012	5.55	2.54167
2	04Feb2012	4.8333	0
2	05Feb2012	5.6833	0.38333
2	06Feb2012	6.45	-4.4875
2	07Feb2012	4.8	0.93333
2	08Feb2012	5.8167	-0.86667
2	09Feb2012	6.15	-0.21667
2	10Feb2012	5.3	-0.85
2	11Feb2012	4.8667	-5.90417
2	12Feb2012	7.5167	-1.525
2	13Feb2012	7.2333	0.08333
2	14Feb2012	4.85	3.675
2	15Feb2012	5.9	0.45833
2	16Feb2012	7.45	0.275
2	17Feb2012	4.0333	0.80833
2	18Feb2012	4.5833	0.575
2	19Feb2012	4.45	2.7
2	20Feb2012	7.3333	0.55
2	21Feb2012	5.35	1.25
2	22Feb2012	4.35	-1.025
2	23Feb2012	4.4	-2.66667
2	24Feb2012	8.4667	-2.64167
2	25Feb2012	6.3667	-3.88333
2	26Feb2012	6.5667	1.3625
2	27Feb2012	8.2	-0.59167
2	28Feb2012	5.6333	1.31667
2	29Feb2012	7.2	-0.2
2	01Mar2012	5.5667	-0.26667
2	02Mar2012	4.5667	-0.83333
2	03Mar2012	6.4333	-4.19167
2	04Mar2012	5.6667	0.24167
2	05Mar2012	4.4833	2.71667
2	06Mar2012	5.1833	4.12083
2	07Mar2012	5.8	1.225
2	08Mar2012	5.65	0.71667
2	09Mar2012	8.1167	-0.83333
2	10Mar2012	4.2333	0.56667
2	12Mar2012	6.8667	1.2
2	13Mar2012	7.9833	-0.91667
2	14Mar2012	4.05	3.57917
2	15Mar2012	1.6333	2.53333
2	16Mar2012	6.95	0.89167
2	17Mar2012	5.35	0.11667
2	18Mar2012	6.2333	1.79167
2	19Mar2012	3.7667	2.71667
2	20Mar2012	8.5167	1.685
2	21Mar2012	5.4333	2.45
2	22Mar2012	6.5167	0.425
2	23Mar2012	8.45	-0.73333
2	24Mar2012	7.85	-1.78333
2	25Mar2012	6.9	-0.11667
2	26Mar2012	7.3667	-0.61667
2	27Mar2012	8.6667	-1.64167
2	28Mar2012	7.7333	0.26667
2	29Mar2012	4.2333	-0.16667
2	30Mar2012	8.2	-1.38333
2	31Mar2012	5.85	0.80833
2	01Apr2012	6.1	1.85833
2	02Apr2012	6.6333	0.75
2	03Apr2012	4.5833	1.51667
2	04Apr2012	5.4333	3
2	05Apr2012	7.5167	1.12083
2	06Apr2012	3.7667	-0.6
2	07Apr2012	6.2833	1.49167
2	08Apr2012	7.35	0.33333
3	15Nov2011	7.5667	-2.36667
3	16Nov2011	7.6167	-2.89167
3	17Nov2011	7.5	-2.00833
3	18Nov2011	8.85	-3.71667
3	19Nov2011	8.7167	-3.425
3	20Nov2011	7.5667	-2.35833
3	03Dec2011	8.65	-1.16944
3	04Dec2011	6.0167	-0.00833
3	05Dec2011	8.4833	-2.74167
3	06Dec2011	9.1	-3.39167
3	08Dec2011	7.8667	-2.25
3	09Dec2011	8.4333	-2.63333
3	10Dec2011	8.5167	-3.03333
3	07Jan2012	2.5833	-3.71667
3	08Jan2012	4.7667	0.69167
3	10Jan2012	8.7333	-3.59167
3	11Jan2012	7.7	-2.69167
3	12Jan2012	6.3333	-1.875
3	13Jan2012	8.7	-2.025
3	15Jan2012	3.1667	-1.6375
3	16Jan2012	1.3667	-6.23333
3	19Jan2012	10.4167	-3.21667
3	20Jan2012	2.1333	-5.25833
3	21Jan2012	1.4833	-7.125
3	24Jan2012	1.85	-5.08333
3	25Jan2012	1.3667	7.125
3	27Jan2012	1.85	-3.125
3	28Jan2012	2.5	-7.125
3	29Jan2012	8.0833	-2.26667
3	31Jan2012	1.4167	-5.475
3	01Feb2012	1.5167	-3.1
3	02Feb2012	1.3667	-7.125
3	03Feb2012	1.3667	-4.65
3	04Feb2012	2.6167	-7.125
3	05Feb2012	1.5667	-4.98333
3	09Feb2012	9.95	-5.0375
3	10Feb2012	1.95	-7.125
3	11Feb2012	9.8	-3.86667
3	12Feb2012	7.4333	-2.025
3	13Feb2012	8.1	-1.68333
3	14Feb2012	11.05	-6.5125
3	17Feb2012	9.9833	-1.05
3	19Feb2012	10	-4.54167
3	21Feb2012	9.9667	-5.39167
3	22Feb2012	7.3667	-1.58333
3	23Feb2012	8.7667	-2.80833
3	24Feb2012	7.9833	-2.71667
3	25Feb2012	10.0833	-3.73333
3	26Feb2012	7.4	-1.71667
3	27Feb2012	11.0667	-5.45417
3	28Feb2012	7.1333	-2.125
3	29Feb2012	6.3667	-2.075
3	02Mar2012	9.15	-3.01667
3	03Mar2012	7	-1.425
3	04Mar2012	7.8333	-3.19583
3	05Mar2012	10.3333	-4.26667
3	07Mar2012	7.1667	-1.63333
3	08Mar2012	8.0167	-3.96667
3	09Mar2012	5.5333	-3.39167
3	10Mar2012	1.3667	-7.125
3	12Mar2012	9.8333	-5
3	13Mar2012	6.8333	-1.375
3	14Mar2012	8.8333	-3.4
3	15Mar2012	7.7333	-2.28333
3	16Mar2012	6.9	-2.26667
3	17Mar2012	9.45	-4.54167
3	25Mar2012	8.7	-2.20833
3	26Mar2012	7.5833	-1.26667
3	27Mar2012	6.7167	-1.59167
3	28Mar2012	8.4833	-4.3
3	29Mar2012	7.05	-2.375
3	30Mar2012	8.4333	-4.7875
3	31Mar2012	1.3667	6.64167
3	01Apr2012	7.05	-1.99167
3	05Apr2012	9.9	-5.24167
3	06Apr2012	8.25	-3
3	08Apr2012	8.5167	-3.13333
;
Run;
data sleep; set sleep; format wake_date date9.;run;

 

For the survey data, each row represents one survey entry and also a separate variable, with about one survey entry per month per person. Some variables were surveyed at different times of the month even for the same person (for example, Gen Health question was surveyed on Dec 2nd, but Physical Activity was surveyed on Dec 5th, for ID #1). Below is code for sample data from the survey dataset:

 

Data survey;
Input ID date date12. Variable: $14. entry_count answer $;
datalines;
1	2-Dec-11	Gen_Health	1	Fair
1	1-Jan-12	Gen_Health	2	Fair
1	1-Feb-12	Gen_Health	3	Fair
1	5-Dec-11	Physical_Act	1	10
1	4-Jan-12	Physical_Act	2	14
1	4-Feb-12	Physical_Act	3	30
1	28-Nov-11	Diet	1	4
1	28-Dec-11	Diet	2	7
1	28-Jan-12	Diet	3	30
2	30-Nov-11	Gen_Health	1	Good
2	30-Dec-11	Gen_Health	2	Fair
2	29-Jan-12	Gen_Health	3	Poor
2	28-Nov-11	Physical_Act	1	10
2	28-Dec-11	Physical_Act	2	15
2	27-Jan-12	Physical_Act	3	5
2	1-Dec-11	Diet	1	4
2	31-Dec-11	Diet	2	5
2	30-Jan-12	Diet	3	8
3	1-Jan-12	Gen_Health	1	Poor
3	30-Jan-12	Gen_Health	2	Good
3	3-Mar-12	Gen_Health	3	Fair
3	6-Jan-12	Physical_Act	1	2
3	4-Feb-12	Physical_Act	2	4
3	8-Mar-12	Physical_Act	3	4
3	25-Dec-11	Diet	1	3
3	23-Jan-12	Diet	2	7
3	25-Feb-12	Diet	3	9
;
Run;
data survey; set survey; format date date9.;run;

 

I've given an example of data for 3 IDs here, with 3 different variables repeated across 3 survey administrations ("entry_count"), with the date of survey question administration ("date"), and the answer to that variable ("answer").

 

What I'd like to obtain is sleep data (mean/SD of sleep duration and sleep start) for 30 days before and 30 days after the date of each unique survey administration per ID. I'd like to merge this sleep data with the survey data. Below are code for a sample of what I'd like the data to look like. Note that I've only filled out the first two rows for the first section (before interval) and the first row for the second section (after interval). Also note that 30dBF means 30 days before (actually representing 29 days before); and 30dAF means 30 days after. Before the actual sleep variables themselves, I should have variables representing the start date of the "before" interval (29 days before the survey question administration), the end date of the "before" interval (the date of the survey administration); the start date of the "after" interval (1 day after the survey question administration), and the end date of the "after" interval (30 days after the date of the survey administration).

 

Data survey;
Input ID date date12. Variable: $14. entry_count answer $ Sleep_30dBF_date_start date12. Sleep_30dBF_date_end date12. Sleep_dur_hrs_avg_30dBF Sleep_dur_hrs_std_30dBF Sleep_start_MC_hrs_avg_30dBF Sleep_start_MC_hrs_std_30dBF Sleep_30dAF_date_start date12. Sleep_30dAF_date_end date12. Sleep_dur_hrs_avg_30dAF Sleep_dur_hrs_std_30dAF Sleep_start_MC_hrs_avg_30dAF Sleep_start_MC_hrs_std_30dAF;
datalines;
1	02Dec2011	Gen_Health	1	Fair	03Nov2011	02Dec2011	7.25	1.00	-1.51	1.65	03Dec2011	01Jan2012	7.19	1.38	-1.60	1.51
1	01Jan2012	Gen_Health	2	Fair	03Dec2011	01Jan2012	7.17	1.3956345	-1.57	1.513422712	02Jan2012	31Jan2012	.	.	.	.
1	01Feb2012	Gen_Health	3	Fair	03Jan2012	01Feb2012	.	.	.	.	02Feb2012	02Mar2012	.	.	.	.
1	05Dec2011	Physical_Act	1	10	06Nov2011	05Dec2011	.	.	.	.	06Dec2011	04Jan2012	.	.	.	.
1	04Jan2012	Physical_Act	2	14	06Dec2011	04Jan2012	.	.	.	.	05Jan2012	03Feb2012	.	.	.	.
1	04Feb2012	Physical_Act	3	30	06Jan2012	04Feb2012	.	.	.	.	05Feb2012	05Mar2012	.	.	.	.
1	28Nov2011	Diet	1	4	30Oct2011	28Nov2011	.	.	.	.	29Nov2011	28Dec2011	.	.	.	.
1	28Dec2011	Diet	2	7	29Nov2011	28Dec2011	.	.	.	.	29Dec2011	27Jan2012	.	.	.	.
1	28Jan2012	Diet	3	30	30Dec2011	28Jan2012	.	.	.	.	29Jan2012	27Feb2012	.	.	.	.
2	30Nov2011	Gen_Health	1	Good	01Nov2011	30Nov2011	.	.	.	.	01Dec2011	30Dec2011	.	.	.	.
2	30Dec2011	Gen_Health	2	Fair	01Dec2011	30Dec2011	.	.	.	.	31Dec2011	29Jan2012	.	.	.	.
2	29Jan2012	Gen_Health	3	Poor	31Dec2011	29Jan2012	.	.	.	.	30Jan2012	28Feb2012	.	.	.	.
2	28Nov2011	Physical_Act	1	10	30Oct2011	28Nov2011	.	.	.	.	29Nov2011	28Dec2011	.	.	.	.
2	28Dec2011	Physical_Act	2	15	29Nov2011	28Dec2011	.	.	.	.	29Dec2011	27Jan2012	.	.	.	.
2	27Jan2012	Physical_Act	3	5	29Dec2011	27Jan2012	.	.	.	.	28Jan2012	26Feb2012	.	.	.	.
2	01Dec2011	Diet	1	4	02Nov2011	01Dec2011	.	.	.	.	02Dec2011	31Dec2011	.	.	.	.
2	31Dec2011	Diet	2	5	02Dec2011	31Dec2011	.	.	.	.	01Jan2012	30Jan2012	.	.	.	.
2	30Jan2012	Diet	3	8	01Jan2012	30Jan2012	.	.	.	.	31Jan2012	29Feb2012	.	.	.	.
3	01Jan2012	Gen_Health	1	Poor	03Dec2011	01Jan2012	.	.	.	.	02Jan2012	31Jan2012	.	.	.	.
3	30Jan2012	Gen_Health	2	Good	01Jan2012	30Jan2012	.	.	.	.	31Jan2012	29Feb2012	.	.	.	.
3	03Mar2012	Gen_Health	3	Fair	03Feb2012	03Mar2012	.	.	.	.	04Mar2012	02Apr2012	.	.	.	.
3	06Jan2012	Physical_Act	1	2	08Dec2011	06Jan2012	.	.	.	.	07Jan2012	05Feb2012	.	.	.	.
3	04Feb2012	Physical_Act	2	4	06Jan2012	04Feb2012	.	.	.	.	05Feb2012	05Mar2012	.	.	.	.
3	08Mar2012	Physical_Act	3	4	08Feb2012	08Mar2012	.	.	.	.	09Mar2012	07Apr2012	.	.	.	.
3	25Dec2011	Diet	1	3	26Nov2011	25Dec2011	.	.	.	.	26Dec2011	24Jan2012	.	.	.	.
3	23Jan2012	Diet	2	7	25Dec2011	23Jan2012	.	.	.	.	24Jan2012	22Feb2012	.	.	.	.
3	25Feb2012	Diet	3	9	27Jan2012	25Feb2012	.	.	.	.	26Feb2012	26Mar2012	.	.	.	.
;
Run;
data survey; set survey; format date date9. Sleep_30dBF_date_start date9. Sleep_30dBF_date_end date9.  Sleep_30dAF_date_start date9. 
Sleep_30dAF_date_end date9.;run;

 

Can anyone help? Thanks.

1 ACCEPTED SOLUTION

Accepted Solutions
FreelanceReinh
Jade | Level 19

You're welcome. Yes, it's easy to include a counter in the WANT dataset (similar to variable _FREQ_ in PROC SUMMARY output datasets) or actually two counters: say freqBF (for the 30-day period "before") and freqAF (for the 30-day period  "after"). Currently we have a temporary counter named i which is used for both periods and then dropped. So we just rename that and don't drop it.

data want(drop=wake_date Sleep_dur_hrs Sleep_start_MC_hrs);
if _n_=1 then do;
  if 0 then set sleep;
  dcl hash h(dataset:'sleep');
  h.definekey('id','wake_date');
  h.definedata('Sleep_dur_hrs','Sleep_start_MC_hrs');
  h.definedone();
end;
array dur[30] _temporary_;
array stt[30] _temporary_;
set survey;

call missing(of dur[*], of stt[*]);
freqBF=0;
SleepBF_date_start=date-29;
SleepBF_date_end=date;
do wake_date=SleepBF_date_start to SleepBF_date_end;
  if h.find()=0 then do;
    freqBF+1;
    dur[freqBF]=Sleep_dur_hrs;
    stt[freqBF]=Sleep_start_MC_hrs;
  end;
end;
if freqBF then do;
  Sleep_dur_hrs_avgBF=mean(of dur[*]);
  Sleep_dur_hrs_stdBF=std(of dur[*]);
  Sleep_start_MC_hrs_avgBF=mean(of stt[*]);
  Sleep_start_MC_hrs_stdBF=std(of stt[*]);
end;

call missing(of dur[*], of stt[*]);
freqAF=0;
SleepAF_date_start=date+1;
SleepAF_date_end=date+30;
do wake_date=SleepAF_date_start to SleepAF_date_end;
  if h.find()=0 then do;
    freqAF+1;
    dur[freqAF]=Sleep_dur_hrs;
    stt[freqAF]=Sleep_start_MC_hrs;
  end;
end;
if freqAF then do;
  Sleep_dur_hrs_avgAF=mean(of dur[*]);
  Sleep_dur_hrs_stdAF=std(of dur[*]);
  Sleep_start_MC_hrs_avgAF=mean(of stt[*]);
  Sleep_start_MC_hrs_stdAF=std(of stt[*]);
end;

format SleepBF_date_start SleepBF_date_end SleepAF_date_start SleepAF_date_end date9.;
run;

Now freqBF and freqAF count the records from dataset SLEEP that were found in their respective 30-day periods. These counts would include missing values, if any. So, if there are observations in the SLEEP data with missing values of Sleep_dur_hrs or Sleep_start_MC_hrs and you want to count only non-missing values (similar to the N statistic of PROC SUMMARY), then we'll need to split the counters further (so as to have four in total, one per analysis variable and period) and modify the code correspondingly -- which would be no problem.

 

While revising the code to include freqBF and freqAF, I found a mistake and corrected it: In the earlier code version I had rounded the Sleep_start_MC_hrs values of the 30-days "after" period to two decimals (using the ROUND function). But that was only a leftover of an attempt to match your sample "want" values for ID=1. Now the "exact" values are used, as they should. Sorry for that.

View solution in original post

5 REPLIES 5
PaigeMiller
Diamond | Level 26

@PaigeMiller and @FreelanceReinh recommended I post a separate topic inquiring about hash objects, which ostensibly should take much longer to run

 

Should say: much less time to run


For anyone who might be interested, it was not stated here but stated in the other thread, that the real problem has over 300,000 observations in the SURVEY data set. So the solution needs to scale.  @confooseddesi89 Its an extremely important piece of information, you shouldn't leave that out. We already have a solution that runs in seconds on your small test data sets.

--
Paige Miller
FreelanceReinh
Jade | Level 19

Hello @confooseddesi89,

 

Try this:

data want(drop=i wake_date Sleep_dur_hrs Sleep_start_MC_hrs);
if _n_=1 then do;
  if 0 then set sleep;
  dcl hash h(dataset:'sleep');
  h.definekey('id','wake_date');
  h.definedata('Sleep_dur_hrs','Sleep_start_MC_hrs');
  h.definedone();
end;
array dur[30] _temporary_;
array stt[30] _temporary_;
set survey;

call missing(of dur[*], of stt[*]);
i=0;
SleepBF_date_start=date-29;
SleepBF_date_end=date;
do wake_date=SleepBF_date_start to SleepBF_date_end;
  if h.find()=0 then do;
    i+1;
    dur[i]=Sleep_dur_hrs;
    stt[i]=Sleep_start_MC_hrs;
  end;
end;
if i then do;
  Sleep_dur_hrs_avgBF=mean(of dur[*]);
  Sleep_dur_hrs_stdBF=std(of dur[*]);
  Sleep_start_MC_hrs_avgBF=mean(of stt[*]);
  Sleep_start_MC_hrs_stdBF=std(of stt[*]);
end;

call missing(of dur[*], of stt[*]);
i=0;
SleepAF_date_start=date+1;
SleepAF_date_end=date+30;
do wake_date=SleepAF_date_start to SleepAF_date_end;
  if h.find()=0 then do;
    i+1;
    dur[i]=Sleep_dur_hrs;
    stt[i]=round(Sleep_start_MC_hrs,.01);
  end;
end;
if i then do;
  Sleep_dur_hrs_avgAF=mean(of dur[*]);
  Sleep_dur_hrs_stdAF=std(of dur[*]);
  Sleep_start_MC_hrs_avgAF=mean(of stt[*]);
  Sleep_start_MC_hrs_stdAF=std(of stt[*]);
end;

format SleepBF_date_start SleepBF_date_end SleepAF_date_start SleepAF_date_end date9.;
run;

The above code ran about 10 seconds on datasets created by replicating your sample datasets 12,121 times and adapting the IDs, i.e., a SLEEP dataset with 4,484,770 observations and a SURVEY dataset with 327,267 observations, both involving the same set of 36,363 distinct IDs.

 

I get slightly different results for the "AF" variables, but they match your four sample results if I (incorrectly?) shift the 30-day period by minus one day.

 

Edit: If your real SLEEP dataset contains "too many" missing values, you may get the note "Missing values were generated ..." in the log, for example, if only one non-missing value is involved in the calculation of a standard deviation. To avoid this note, add suitable IF conditions before the affected assignment statements, for example:

if n(of dur[*])>1 then Sleep_dur_hrs_stdBF=std(of dur[*]);

The existing condition "if i then ..." covers the case that no sleep data in the relevant 30-day interval are found for an observation in the SURVEY data (e.g., because the ID does not occur in the sleep data).

confooseddesi89
Quartz | Level 8

Hi,

That worked so well! Thanks!

 

I have one issue - I need to know the _FREQ_ for both the interval that's 30 days before (BF) and the interval that's 30 days after (AF). Is there a way to extract this information and include it in the created dataset? Thanks.

FreelanceReinh
Jade | Level 19

You're welcome. Yes, it's easy to include a counter in the WANT dataset (similar to variable _FREQ_ in PROC SUMMARY output datasets) or actually two counters: say freqBF (for the 30-day period "before") and freqAF (for the 30-day period  "after"). Currently we have a temporary counter named i which is used for both periods and then dropped. So we just rename that and don't drop it.

data want(drop=wake_date Sleep_dur_hrs Sleep_start_MC_hrs);
if _n_=1 then do;
  if 0 then set sleep;
  dcl hash h(dataset:'sleep');
  h.definekey('id','wake_date');
  h.definedata('Sleep_dur_hrs','Sleep_start_MC_hrs');
  h.definedone();
end;
array dur[30] _temporary_;
array stt[30] _temporary_;
set survey;

call missing(of dur[*], of stt[*]);
freqBF=0;
SleepBF_date_start=date-29;
SleepBF_date_end=date;
do wake_date=SleepBF_date_start to SleepBF_date_end;
  if h.find()=0 then do;
    freqBF+1;
    dur[freqBF]=Sleep_dur_hrs;
    stt[freqBF]=Sleep_start_MC_hrs;
  end;
end;
if freqBF then do;
  Sleep_dur_hrs_avgBF=mean(of dur[*]);
  Sleep_dur_hrs_stdBF=std(of dur[*]);
  Sleep_start_MC_hrs_avgBF=mean(of stt[*]);
  Sleep_start_MC_hrs_stdBF=std(of stt[*]);
end;

call missing(of dur[*], of stt[*]);
freqAF=0;
SleepAF_date_start=date+1;
SleepAF_date_end=date+30;
do wake_date=SleepAF_date_start to SleepAF_date_end;
  if h.find()=0 then do;
    freqAF+1;
    dur[freqAF]=Sleep_dur_hrs;
    stt[freqAF]=Sleep_start_MC_hrs;
  end;
end;
if freqAF then do;
  Sleep_dur_hrs_avgAF=mean(of dur[*]);
  Sleep_dur_hrs_stdAF=std(of dur[*]);
  Sleep_start_MC_hrs_avgAF=mean(of stt[*]);
  Sleep_start_MC_hrs_stdAF=std(of stt[*]);
end;

format SleepBF_date_start SleepBF_date_end SleepAF_date_start SleepAF_date_end date9.;
run;

Now freqBF and freqAF count the records from dataset SLEEP that were found in their respective 30-day periods. These counts would include missing values, if any. So, if there are observations in the SLEEP data with missing values of Sleep_dur_hrs or Sleep_start_MC_hrs and you want to count only non-missing values (similar to the N statistic of PROC SUMMARY), then we'll need to split the counters further (so as to have four in total, one per analysis variable and period) and modify the code correspondingly -- which would be no problem.

 

While revising the code to include freqBF and freqAF, I found a mistake and corrected it: In the earlier code version I had rounded the Sleep_start_MC_hrs values of the 30-days "after" period to two decimals (using the ROUND function). But that was only a leftover of an attempt to match your sample "want" values for ID=1. Now the "exact" values are used, as they should. Sorry for that.

SAS Innovate 2025: Register Today!

 

Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.


Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 903 views
  • 1 like
  • 3 in conversation