Hello,
I have two datasets: One sleep dataset and one survey dataset. Each dataset is long form, with ID numbers repeating by row. Each ID has several months of data in each dataset.
For the sleep dataset, each row represents one night of sleep data, and each variable is represented in a separate column. See below an example of data for ID #1, with variables Sleep_dur_hrs and Sleep_start_MC_hrs (MC means midnight-centered) by wake_date:
user_id |
wake_date |
Sleep_dur_hrs |
Sleep_start_MC_hrs |
1 |
14Nov2011 |
7.52 |
-1.25 |
1 |
15Nov2011 |
5.43 |
-1.47 |
1 |
16Nov2011 |
8.40 |
-5.46 |
1 |
17Nov2011 |
7.52 |
-1.53 |
1 |
18Nov2011 |
7.70 |
-1.47 |
1 |
19Nov2011 |
7.60 |
-1.52 |
1 |
20Nov2011 |
7.40 |
-1.82 |
1 |
21Nov2011 |
8.23 |
-4.89 |
1 |
22Nov2011 |
6.75 |
-1.57 |
1 |
23Nov2011 |
7.20 |
-1.10 |
1 |
24Nov2011 |
4.15 |
2.62 |
1 |
25Nov2011 |
8.10 |
-1.38 |
1 |
26Nov2011 |
7.77 |
-1.49 |
1 |
27Nov2011 |
7.48 |
-1.07 |
1 |
28Nov2011 |
7.73 |
-.88 |
1 |
29Nov2011 |
7.03 |
.49 |
1 |
30Nov2011 |
7.45 |
-1.72 |
1 |
01Dec2011 |
7.62 |
-1.45 |
1 |
02Dec2011 |
6.62 |
-1.73 |
1 |
03Dec2011 |
7.67 |
-1.34 |
1 |
04Dec2011 |
6.92 |
-3.83 |
1 |
05Dec2011 |
7.27 |
-1.40 |
1 |
06Dec2011 |
6.40 |
-1.27 |
1 |
07Dec2011 |
8.60 |
-1.73 |
1 |
08Dec2011 |
7.18 |
-1.28 |
1 |
09Dec2011 |
7.18 |
-.52 |
1 |
10Dec2011 |
8.03 |
-1.62 |
1 |
11Dec2011 |
6.73 |
-.78 |
1 |
12Dec2011 |
8.43 |
-2.58 |
1 |
13Dec2011 |
7.65 |
-2.00 |
1 |
14Dec2011 |
5.92 |
-2.72 |
1 |
15Dec2011 |
9.12 |
-2.78 |
1 |
16Dec2011 |
7.50 |
-1.54 |
1 |
17Dec2011 |
7.77 |
-2.19 |
1 |
18Dec2011 |
6.82 |
-1.93 |
1 |
19Dec2011 |
7.77 |
-1.57 |
1 |
20Dec2011 |
7.50 |
-2.20 |
1 |
21Dec2011 |
8.77 |
-2.49 |
1 |
22Dec2011 |
7.87 |
-2.67 |
1 |
23Dec2011 |
6.28 |
-.02 |
1 |
24Dec2011 |
8.40 |
-1.27 |
1 |
25Dec2011 |
4.98 |
-1.18 |
1 |
26Dec2011 |
7.63 |
-1.78 |
1 |
27Dec2011 |
4.03 |
1.85 |
1 |
28Dec2011 |
7.57 |
-2.22 |
1 |
29Dec2011 |
2.70 |
3.27 |
1 |
30Dec2011 |
8.35 |
-5.27 |
1 |
31Dec2011 |
8.10 |
-1.17 |
1 |
01Jan2012 |
5.83 |
-.98 |
1 |
02Jan2012 |
6.93 |
-2.41 |
1 |
03Jan2012 |
6.38 |
-2.09 |
For the survey data, each row represents one survey entry and also a separate variable, with about one survey entry per month per person. Some variables were surveyed at different times of the month even for the same person (for example, Gen Health question was surveyed on Dec 2nd, but Physical Activity was surveyed on Dec 5th, for ID #1). See below an example of the data:
ID |
date |
Variable |
entry_count |
answer |
1 |
02Dec2011 |
Gen_Health |
1 |
Fair |
1 |
01Jan2012 |
Gen_Health |
2 |
Fair |
1 |
01Feb2012 |
Gen_Health |
3 |
Fair |
1 |
05Dec2011 |
Physical_Act |
1 |
10 |
1 |
04Jan2012 |
Physical_Act |
2 |
14 |
1 |
04Feb2012 |
Physical_Act |
3 |
30 |
1 |
28Nov2011 |
Diet |
1 |
4 |
1 |
28Dec2011 |
Diet |
2 |
7 |
1 |
28Jan2012 |
Diet |
3 |
30 |
2 |
30Nov2011 |
Gen_Health |
1 |
Good |
2 |
30Dec2011 |
Gen_Health |
2 |
Fair |
2 |
29Jan2012 |
Gen_Health |
3 |
Poor |
2 |
28Nov2011 |
Physical_Act |
1 |
10 |
2 |
28Dec2011 |
Physical_Act |
2 |
15 |
2 |
27Jan2012 |
Physical_Act |
3 |
5 |
2 |
01Dec2011 |
Diet |
1 |
4 |
2 |
31Dec2011 |
Diet |
2 |
5 |
2 |
30Jan2012 |
Diet |
3 |
8 |
3 |
01Jan2012 |
Gen_Health |
1 |
Poor |
3 |
30Jan2012 |
Gen_Health |
2 |
Good |
3 |
03Mar2012 |
Gen_Health |
3 |
Fair |
3 |
06Jan2012 |
Physical_Act |
1 |
2 |
3 |
04Feb2012 |
Physical_Act |
2 |
4 |
3 |
08Mar2012 |
Physical_Act |
3 |
4 |
3 |
25Dec2011 |
Diet |
1 |
3 |
3 |
23Jan2012 |
Diet |
2 |
7 |
3 |
25Feb2012 |
Diet |
3 |
9 |
I've given an example of data for 3 IDs here, with 3 different variables repeated across 3 survey administrations ("entry_count"), with the date of survey question administration ("date"), and the answer to that variable ("answer").
What I'd like to obtain is sleep data (mean/SD of sleep duration and sleep start) for 30 days before and 30 days after the date of each unique survey administration per ID. I'd like to merge this sleep data with the survey data. Below are sample data. Note that I've only filled out the first two rows for the first section and the first row for the second section. Also note that 30dBF means 30 days before; and 30dAF means 30 days after. Before the actual sleep variables themselves, I should have variables representing the start date of the "before" interval (29 days before the survey question administration), the end date of the "before" interval (the date of the survey administration); the start date of the "after" interval (1 day after the survey question administration), and the end date of the "after" interval (30 days after the date of the survey administration);
ID | date | Variable | entry_count | answer | SleepBF_date_start | SleepBF_date_end | Sleep_dur_hrs_avgBF | Sleep_dur_hrs_stdBF | Sleep_start_MC_hrs_avgBF | Sleep_start_MC_hrs_stdBF | SleepAF_date_start | SleepAF_date_end | Sleep_dur_hrs_avgAF | Sleep_dur_hrs_stdAF | Sleep_start_MC_hrs_avgAF | Sleep_start_MC_hrs_stdAF |
1 | 02Dec2011 | Gen_Health | 1 | Fair | 03Nov2011 | 02Dec2011 | 7.25 | 1.00 | -1.51 | 1.65 | 03Dec2011 | 01Jan2012 | 7.19 | 1.38 | -1.60 | 1.51 |
1 | 01Jan2012 | Gen_Health | 2 | Fair | 03Dec2011 | 01Jan2012 | 7.17 | 1.3956345 | -1.57 | 1.513422712 | 02Jan2012 | 31Jan2012 | ||||
1 | 01Feb2012 | Gen_Health | 3 | Fair | 03Jan2012 | 01Feb2012 | 02Feb2012 | 02Mar2012 | ||||||||
1 | 05Dec2011 | Physical_Act | 1 | 10 | 06Nov2011 | 05Dec2011 | 06Dec2011 | 04Jan2012 | ||||||||
1 | 04Jan2012 | Physical_Act | 2 | 14 | 06Dec2011 | 04Jan2012 | 05Jan2012 | 03Feb2012 | ||||||||
1 | 04Feb2012 | Physical_Act | 3 | 30 | 06Jan2012 | 04Feb2012 | 05Feb2012 | 05Mar2012 | ||||||||
1 | 28Nov2011 | Diet | 1 | 4 | 30Oct2011 | 28Nov2011 | 29Nov2011 | 28Dec2011 | ||||||||
1 | 28Dec2011 | Diet | 2 | 7 | 29Nov2011 | 28Dec2011 | 29Dec2011 | 27Jan2012 | ||||||||
1 | 28Jan2012 | Diet | 3 | 30 | 30Dec2011 | 28Jan2012 | 29Jan2012 | 27Feb2012 | ||||||||
2 | 30Nov2011 | Gen_Health | 1 | Good | 01Nov2011 | 30Nov2011 | 01Dec2011 | 30Dec2011 | ||||||||
2 | 30Dec2011 | Gen_Health | 2 | Fair | 01Dec2011 | 30Dec2011 | 31Dec2011 | 29Jan2012 | ||||||||
2 | 29Jan2012 | Gen_Health | 3 | Poor | 31Dec2011 | 29Jan2012 | 30Jan2012 | 28Feb2012 | ||||||||
2 | 28Nov2011 | Physical_Act | 1 | 10 | 30Oct2011 | 28Nov2011 | 29Nov2011 | 28Dec2011 | ||||||||
2 | 28Dec2011 | Physical_Act | 2 | 15 | 29Nov2011 | 28Dec2011 | 29Dec2011 | 27Jan2012 | ||||||||
2 | 27Jan2012 | Physical_Act | 3 | 5 | 29Dec2011 | 27Jan2012 | 28Jan2012 | 26Feb2012 | ||||||||
2 | 01Dec2011 | Diet | 1 | 4 | 02Nov2011 | 01Dec2011 | 02Dec2011 | 31Dec2011 | ||||||||
2 | 31Dec2011 | Diet | 2 | 5 | 02Dec2011 | 31Dec2011 | 01Jan2012 | 30Jan2012 | ||||||||
2 | 30Jan2012 | Diet | 3 | 8 | 01Jan2012 | 30Jan2012 | 31Jan2012 | 29Feb2012 | ||||||||
3 | 01Jan2012 | Gen_Health | 1 | Poor | 03Dec2011 | 01Jan2012 | 02Jan2012 | 31Jan2012 | ||||||||
3 | 30Jan2012 | Gen_Health | 2 | Good | 01Jan2012 | 30Jan2012 | 31Jan2012 | 29Feb2012 | ||||||||
3 | 03Mar2012 | Gen_Health | 3 | Fair | 03Feb2012 | 03Mar2012 | 04Mar2012 | 02Apr2012 | ||||||||
3 | 06Jan2012 | Physical_Act | 1 | 2 | 08Dec2011 | 06Jan2012 | 07Jan2012 | 05Feb2012 | ||||||||
3 | 04Feb2012 | Physical_Act | 2 | 4 | 06Jan2012 | 04Feb2012 | 05Feb2012 | 05Mar2012 | ||||||||
3 | 08Mar2012 | Physical_Act | 3 | 4 | 08Feb2012 | 08Mar2012 | 09Mar2012 | 07Apr2012 | ||||||||
3 | 25Dec2011 | Diet | 1 | 3 | 26Nov2011 | 25Dec2011 | 26Dec2011 | 24Jan2012 | ||||||||
3 | 23Jan2012 | Diet | 2 | 7 | 25Dec2011 | 23Jan2012 | 24Jan2012 | 22Feb2012 | ||||||||
3 | 25Feb2012 | Diet | 3 | 9 | 27Jan2012 | 25Feb2012 | 26Feb2012 | 26Mar2012 |
This is way beyond any of my available SAS coding knowledge, and I'm really in need of assistance for some analyses that are due soon. Can anyone help? Thanks.
So, this gets a little crazy, but I believe the solution I give using CALL EXECUTE produces the information you want, and I leave it up to you to re-arrange it into whatever table is of use to you. I also leave it up to you to check the calculations for one or two data points.
proc delete data=all_stats;
run;
data _null_;
set survey;
thirty_days_before=date-30;
one_day_after=date+1;
thirty_days_after=date+30;
call execute(
"data new;
set sleep(where=(id=" || id || "));
survey_date=" || date || ";
if wake_date >= " || thirty_days_before || " and wake_date <= " || date || " then before_after=1;
else if wake_date >= " || one_day_after || " and wake_date <= " || thirty_days_after || " then before_after=2;
format survey_date date9.;
run;
proc summary nway data=new;
class before_after;
id id survey_date;
var sleep_dur_hrs sleep_start_mc_hrs;
output out=work._stats_ mean= std=/autoname;
run;
proc append base=all_stats new=work._stats_;
run;
");
run;
Are there simpler ways to do this? Quite possibly, but this is what I came up with.
Hello @confooseddesi89
Can you please turn these data screen captures into working SAS data step code (instructions)? We can't work with the data in screen captures.
Here is the data step code for the sleep data (sample data):
Data sleep;
Input id wake_date date12. Sleep_dur_hrs Sleep_start_MC_hrs;
datalines;
1 14Nov2011 7.5167 -1.25
1 15Nov2011 5.4333 -1.46667
1 16Nov2011 8.4 -5.45833
1 17Nov2011 7.5167 -1.53333
1 18Nov2011 7.7 -1.46667
1 19Nov2011 7.6 -1.51667
1 20Nov2011 7.4 -1.81667
1 21Nov2011 8.2333 -4.89167
1 22Nov2011 6.75 -1.56667
1 23Nov2011 7.2 -1.1
1 24Nov2011 4.15 2.61667
1 25Nov2011 8.1 -1.38333
1 26Nov2011 7.7667 -1.49167
1 27Nov2011 7.4833 -1.06667
1 28Nov2011 7.7333 -0.88333
1 29Nov2011 7.0333 0.49167
1 30Nov2011 7.45 -1.71667
1 01Dec2011 7.6167 -1.45
1 02Dec2011 6.6167 -1.73333
1 03Dec2011 7.6667 -1.34167
1 04Dec2011 6.9167 -3.825
1 05Dec2011 7.2667 -1.4
1 06Dec2011 6.4 -1.26667
1 07Dec2011 8.6 -1.725
1 08Dec2011 7.1833 -1.275
1 09Dec2011 7.1833 -0.51667
1 10Dec2011 8.0333 -1.61667
1 11Dec2011 6.7333 -0.78333
1 12Dec2011 8.4333 -2.58333
1 13Dec2011 7.65 -2
1 14Dec2011 5.9167 -2.71667
1 15Dec2011 9.1167 -2.775
1 16Dec2011 7.5 -1.54167
1 17Dec2011 7.7667 -2.19167
1 18Dec2011 6.8167 -1.93333
1 19Dec2011 7.7667 -1.56667
1 20Dec2011 7.5 -2.2
1 21Dec2011 8.7667 -2.49167
1 22Dec2011 7.8667 -2.66667
1 23Dec2011 6.2833 -0.01667
1 24Dec2011 8.4 -1.26667
1 25Dec2011 4.9833 -1.18333
1 26Dec2011 7.6333 -1.775
1 27Dec2011 4.0333 1.85
1 28Dec2011 7.5667 -2.21667
1 29Dec2011 2.7 3.26667
1 30Dec2011 8.35 -5.26667
1 31Dec2011 8.1 -1.16667
1 01Jan2012 5.8333 -0.98333
1 02Jan2012 6.9333 -2.40833
1 03Jan2012 6.3833 -2.09444
1 04Jan2012 8.4667 -2.61667
1 05Jan2012 7.45 -1.21667
1 06Jan2012 7.5167 -0.76667
1 07Jan2012 7.0667 -1.15
1 08Jan2012 8.45 -2.43333
1 09Jan2012 7.4667 -0.91667
1 10Jan2012 7.3 -1.03333
1 11Jan2012 8.6667 -2.76667
1 12Jan2012 8.1 -0.33333
1 13Jan2012 5.3333 1.3
1 14Jan2012 7.9333 -1.1
1 15Jan2012 6.35 -3.01389
1 16Jan2012 7.4333 -1.03333
1 17Jan2012 7.3833 -1.55
1 18Jan2012 8.5 -1.35
1 19Jan2012 4.7333 -0.15
1 20Jan2012 7.3667 -1.28333
1 21Jan2012 8.5 -1.36667
1 22Jan2012 7.9 -0.76667
1 23Jan2012 7.8 -0.48333
1 24Jan2012 8.1667 -1.1
1 25Jan2012 7.95 -1.45
1 26Jan2012 7.7333 -1.25
1 27Jan2012 7.3833 -1.05833
1 28Jan2012 7.95 -0.88333
1 29Jan2012 7.3833 -0.98333
1 30Jan2012 5.4833 2.0125
1 31Jan2012 6.6167 -0.13333
1 01Feb2012 7.0667 1.40417
1 02Feb2012 6.7 -0.55833
1 03Feb2012 9.1333 -1.575
1 04Feb2012 7.5167 -1.69167
1 05Feb2012 7.7833 -1.98333
1 06Feb2012 9.65 -1.875
1 07Feb2012 4.0833 0.9
1 08Feb2012 7.2333 -1.58333
1 09Feb2012 7.9167 -0.85
1 10Feb2012 8.1333 -1.65
1 11Feb2012 8.0333 -2.36667
1 12Feb2012 6.8667 -0.66667
1 13Feb2012 7.75 -2.04167
1 14Feb2012 5.6833 -2.10833
1 15Feb2012 5.65 -0.23333
1 16Feb2012 7.9667 -0.53333
1 17Feb2012 7.7167 -1.38333
1 18Feb2012 8.45 -1.55
1 19Feb2012 6.2333 1.78333
1 20Feb2012 8.2167 -0.93333
1 21Feb2012 8.3167 -1.75
1 22Feb2012 8.6 -2.3
1 23Feb2012 8.1667 -1.51667
1 24Feb2012 7.7833 -1.1
1 25Feb2012 7.6167 -0.85
1 26Feb2012 6.6 -1.45
1 27Feb2012 6.1333 -1.13333
1 28Feb2012 8.55 -3.7
1 29Feb2012 8.0667 -2.39167
1 01Mar2012 8.15 -1.23333
1 02Mar2012 8.75 -1.41667
1 03Mar2012 8.1833 -0.79167
1 04Mar2012 8.3333 -1.75833
1 05Mar2012 7.7333 -1.16667
1 06Mar2012 6.3333 0.05
1 07Mar2012 7.9167 -1.51667
1 08Mar2012 7.2 -1.25833
1 09Mar2012 6 2.25833
1 10Mar2012 8.1667 -1.175
1 11Mar2012 6.5833 -0.68333
1 12Mar2012 6.5667 1.575
1 13Mar2012 7.65 -0.25
1 14Mar2012 7.1333 -0.24167
1 15Mar2012 7.3167 -2.475
1 16Mar2012 8.55 -2
1 17Mar2012 6.2333 -0.28333
1 18Mar2012 8.1667 -1.26667
1 19Mar2012 8.7333 -1.68333
1 20Mar2012 7.8167 -1.15833
1 21Mar2012 7.7167 -0.73333
1 22Mar2012 7.7167 -0.43333
1 23Mar2012 7.85 -0.48333
1 24Mar2012 3.9 0.71667
1 25Mar2012 7.8167 -2.1
1 26Mar2012 7.5333 1.54167
1 27Mar2012 7.9167 -1.05
1 28Mar2012 8.8667 -1.35
1 29Mar2012 9.3667 -0.58333
1 30Mar2012 5.1 1.875
1 31Mar2012 7.35 -0.33333
1 01Apr2012 6.5333 -1.26667
1 02Apr2012 6.7 -1.13333
1 03Apr2012 9 -0.53333
1 04Apr2012 7.2 -1.81667
1 05Apr2012 8.1833 3.73333
1 06Apr2012 8.65 -1.23333
1 07Apr2012 7.1667 0.44167
1 08Apr2012 7.5833 -1.21667
2 14Nov2011 7.45 -2.23333
2 15Nov2011 7.3 -2.11667
2 16Nov2011 6.0833 -1.7
2 17Nov2011 6.3333 -2.35
2 18Nov2011 5.3167 -1.14167
2 19Nov2011 6.9 -1.025
2 20Nov2011 6.25 -1.50833
2 21Nov2011 5.2667 1.675
2 22Nov2011 5.9667 -1.775
2 23Nov2011 5.9 -2.98333
2 24Nov2011 6.1167 0.33333
2 25Nov2011 5.3667 1.55833
2 26Nov2011 8.4667 -3.2
2 27Nov2011 5.5667 -1.5
2 28Nov2011 7.05 1.61667
2 29Nov2011 6.1667 -0.65833
2 30Nov2011 5.95 2.0375
2 01Dec2011 6.15 -0.98333
2 02Dec2011 4.6833 -0.55833
2 03Dec2011 3.7167 -0.45
2 04Dec2011 6.1333 -0.7
2 05Dec2011 4.9833 -0.45
2 06Dec2011 7.8833 -1.38333
2 07Dec2011 6.2 0.95
2 08Dec2011 6.4167 0.18333
2 09Dec2011 6.3167 1.25
2 10Dec2011 4.6167 0.86667
2 11Dec2011 3.65 2.56667
2 12Dec2011 7.15 -0.21667
2 13Dec2011 6.8 0.2
2 14Dec2011 2.9667 4.40833
2 15Dec2011 5.5167 2.3
2 16Dec2011 8.25 -0.89167
2 17Dec2011 3.55 1.8
2 18Dec2011 7.0167 0.25833
2 19Dec2011 5.8 2.84583
2 20Dec2011 6.1333 -2.51667
2 21Dec2011 5.2333 -1.425
2 22Dec2011 7.4833 -0.825
2 23Dec2011 6.3667 0.24167
2 24Dec2011 4.6667 -0.35
2 25Dec2011 6.4333 -0.55
2 26Dec2011 6.6667 -0.93333
2 27Dec2011 6.3 -0.88333
2 28Dec2011 5.15 -0.025
2 29Dec2011 4.6833 2.67083
2 30Dec2011 8.1 -2.4
2 31Dec2011 4.4333 -3.80417
2 01Jan2012 7.05 -1.28333
2 02Jan2012 6.7 -0.29167
2 03Jan2012 5.6833 3.17083
2 04Jan2012 6.55 2.34583
2 05Jan2012 6.7667 -0.33056
2 06Jan2012 6.0667 0.64167
2 07Jan2012 5.9667 -0.85833
2 08Jan2012 7.3167 0.175
2 09Jan2012 5.2667 -0.61667
2 10Jan2012 5.7667 1.37083
2 11Jan2012 5.8833 3.27917
2 12Jan2012 4.7167 1.23333
2 13Jan2012 7.4333 1.16667
2 14Jan2012 3.95 -2.275
2 15Jan2012 7.35 -1.25
2 16Jan2012 6.3333 -0.10833
2 17Jan2012 3.9333 0.525
2 18Jan2012 2.9167 4.25
2 19Jan2012 6.15 1.21667
2 20Jan2012 7.2 0.58333
2 21Jan2012 4.5833 1.01667
2 22Jan2012 7.1 0.24167
2 23Jan2012 5.8667 2.40417
2 24Jan2012 6.1667 2.2
2 25Jan2012 6.5 0.95
2 26Jan2012 6.1833 1.19167
2 27Jan2012 7.55 -1.075
2 28Jan2012 4.8833 -0.19167
2 29Jan2012 8.5833 -0.76667
2 30Jan2012 4.95 2.44167
2 31Jan2012 4.05 0.75
2 01Feb2012 8.55 -2.2
2 02Feb2012 7.5 -0.14167
2 03Feb2012 5.55 2.54167
2 04Feb2012 4.8333 0
2 05Feb2012 5.6833 0.38333
2 06Feb2012 6.45 -4.4875
2 07Feb2012 4.8 0.93333
2 08Feb2012 5.8167 -0.86667
2 09Feb2012 6.15 -0.21667
2 10Feb2012 5.3 -0.85
2 11Feb2012 4.8667 -5.90417
2 12Feb2012 7.5167 -1.525
2 13Feb2012 7.2333 0.08333
2 14Feb2012 4.85 3.675
2 15Feb2012 5.9 0.45833
2 16Feb2012 7.45 0.275
2 17Feb2012 4.0333 0.80833
2 18Feb2012 4.5833 0.575
2 19Feb2012 4.45 2.7
2 20Feb2012 7.3333 0.55
2 21Feb2012 5.35 1.25
2 22Feb2012 4.35 -1.025
2 23Feb2012 4.4 -2.66667
2 24Feb2012 8.4667 -2.64167
2 25Feb2012 6.3667 -3.88333
2 26Feb2012 6.5667 1.3625
2 27Feb2012 8.2 -0.59167
2 28Feb2012 5.6333 1.31667
2 29Feb2012 7.2 -0.2
2 01Mar2012 5.5667 -0.26667
2 02Mar2012 4.5667 -0.83333
2 03Mar2012 6.4333 -4.19167
2 04Mar2012 5.6667 0.24167
2 05Mar2012 4.4833 2.71667
2 06Mar2012 5.1833 4.12083
2 07Mar2012 5.8 1.225
2 08Mar2012 5.65 0.71667
2 09Mar2012 8.1167 -0.83333
2 10Mar2012 4.2333 0.56667
2 12Mar2012 6.8667 1.2
2 13Mar2012 7.9833 -0.91667
2 14Mar2012 4.05 3.57917
2 15Mar2012 1.6333 2.53333
2 16Mar2012 6.95 0.89167
2 17Mar2012 5.35 0.11667
2 18Mar2012 6.2333 1.79167
2 19Mar2012 3.7667 2.71667
2 20Mar2012 8.5167 1.685
2 21Mar2012 5.4333 2.45
2 22Mar2012 6.5167 0.425
2 23Mar2012 8.45 -0.73333
2 24Mar2012 7.85 -1.78333
2 25Mar2012 6.9 -0.11667
2 26Mar2012 7.3667 -0.61667
2 27Mar2012 8.6667 -1.64167
2 28Mar2012 7.7333 0.26667
2 29Mar2012 4.2333 -0.16667
2 30Mar2012 8.2 -1.38333
2 31Mar2012 5.85 0.80833
2 01Apr2012 6.1 1.85833
2 02Apr2012 6.6333 0.75
2 03Apr2012 4.5833 1.51667
2 04Apr2012 5.4333 3
2 05Apr2012 7.5167 1.12083
2 06Apr2012 3.7667 -0.6
2 07Apr2012 6.2833 1.49167
2 08Apr2012 7.35 0.33333
3 15Nov2011 7.5667 -2.36667
3 16Nov2011 7.6167 -2.89167
3 17Nov2011 7.5 -2.00833
3 18Nov2011 8.85 -3.71667
3 19Nov2011 8.7167 -3.425
3 20Nov2011 7.5667 -2.35833
3 03Dec2011 8.65 -1.16944
3 04Dec2011 6.0167 -0.00833
3 05Dec2011 8.4833 -2.74167
3 06Dec2011 9.1 -3.39167
3 08Dec2011 7.8667 -2.25
3 09Dec2011 8.4333 -2.63333
3 10Dec2011 8.5167 -3.03333
3 07Jan2012 2.5833 -3.71667
3 08Jan2012 4.7667 0.69167
3 10Jan2012 8.7333 -3.59167
3 11Jan2012 7.7 -2.69167
3 12Jan2012 6.3333 -1.875
3 13Jan2012 8.7 -2.025
3 15Jan2012 3.1667 -1.6375
3 16Jan2012 1.3667 -6.23333
3 19Jan2012 10.4167 -3.21667
3 20Jan2012 2.1333 -5.25833
3 21Jan2012 1.4833 -7.125
3 24Jan2012 1.85 -5.08333
3 25Jan2012 1.3667 7.125
3 27Jan2012 1.85 -3.125
3 28Jan2012 2.5 -7.125
3 29Jan2012 8.0833 -2.26667
3 31Jan2012 1.4167 -5.475
3 01Feb2012 1.5167 -3.1
3 02Feb2012 1.3667 -7.125
3 03Feb2012 1.3667 -4.65
3 04Feb2012 2.6167 -7.125
3 05Feb2012 1.5667 -4.98333
3 09Feb2012 9.95 -5.0375
3 10Feb2012 1.95 -7.125
3 11Feb2012 9.8 -3.86667
3 12Feb2012 7.4333 -2.025
3 13Feb2012 8.1 -1.68333
3 14Feb2012 11.05 -6.5125
3 17Feb2012 9.9833 -1.05
3 19Feb2012 10 -4.54167
3 21Feb2012 9.9667 -5.39167
3 22Feb2012 7.3667 -1.58333
3 23Feb2012 8.7667 -2.80833
3 24Feb2012 7.9833 -2.71667
3 25Feb2012 10.0833 -3.73333
3 26Feb2012 7.4 -1.71667
3 27Feb2012 11.0667 -5.45417
3 28Feb2012 7.1333 -2.125
3 29Feb2012 6.3667 -2.075
3 02Mar2012 9.15 -3.01667
3 03Mar2012 7 -1.425
3 04Mar2012 7.8333 -3.19583
3 05Mar2012 10.3333 -4.26667
3 07Mar2012 7.1667 -1.63333
3 08Mar2012 8.0167 -3.96667
3 09Mar2012 5.5333 -3.39167
3 10Mar2012 1.3667 -7.125
3 12Mar2012 9.8333 -5
3 13Mar2012 6.8333 -1.375
3 14Mar2012 8.8333 -3.4
3 15Mar2012 7.7333 -2.28333
3 16Mar2012 6.9 -2.26667
3 17Mar2012 9.45 -4.54167
3 25Mar2012 8.7 -2.20833
3 26Mar2012 7.5833 -1.26667
3 27Mar2012 6.7167 -1.59167
3 28Mar2012 8.4833 -4.3
3 29Mar2012 7.05 -2.375
3 30Mar2012 8.4333 -4.7875
3 31Mar2012 1.3667 6.64167
3 01Apr2012 7.05 -1.99167
3 05Apr2012 9.9 -5.24167
3 06Apr2012 8.25 -3
3 08Apr2012 8.5167 -3.13333
;
Run;
data sleep; set sleep; format wake_date date9.;run;
Here is the data step code for the survey data (sample data):
Data survey;
Input ID date date12. Variable: $14. entry_count answer $ Sleep_30dBF_date_start date12. Sleep_30dBF_date_end date12. Sleep_dur_hrs_avg_30dBF Sleep_dur_hrs_std_30dBF Sleep_start_MC_hrs_avg_30dBF Sleep_start_MC_hrs_std_30dBF Sleep_30dAF_date_start date12. Sleep_30dAF_date_end date12. Sleep_dur_hrs_avg_30dAF Sleep_dur_hrs_std_30dAF Sleep_start_MC_hrs_avg_30dAF Sleep_start_MC_hrs_std_30dAF;
datalines;
1 02Dec2011 Gen_Health 1 Fair 03Nov2011 02Dec2011 7.25 1.00 -1.51 1.65 03Dec2011 01Jan2012 7.19 1.38 -1.60 1.51
1 01Jan2012 Gen_Health 2 Fair 03Dec2011 01Jan2012 7.17 1.3956345 -1.57 1.513422712 02Jan2012 31Jan2012 . . . .
1 01Feb2012 Gen_Health 3 Fair 03Jan2012 01Feb2012 . . . . 02Feb2012 02Mar2012 . . . .
1 05Dec2011 Physical_Act 1 10 06Nov2011 05Dec2011 . . . . 06Dec2011 04Jan2012 . . . .
1 04Jan2012 Physical_Act 2 14 06Dec2011 04Jan2012 . . . . 05Jan2012 03Feb2012 . . . .
1 04Feb2012 Physical_Act 3 30 06Jan2012 04Feb2012 . . . . 05Feb2012 05Mar2012 . . . .
1 28Nov2011 Diet 1 4 30Oct2011 28Nov2011 . . . . 29Nov2011 28Dec2011 . . . .
1 28Dec2011 Diet 2 7 29Nov2011 28Dec2011 . . . . 29Dec2011 27Jan2012 . . . .
1 28Jan2012 Diet 3 30 30Dec2011 28Jan2012 . . . . 29Jan2012 27Feb2012 . . . .
2 30Nov2011 Gen_Health 1 Good 01Nov2011 30Nov2011 . . . . 01Dec2011 30Dec2011 . . . .
2 30Dec2011 Gen_Health 2 Fair 01Dec2011 30Dec2011 . . . . 31Dec2011 29Jan2012 . . . .
2 29Jan2012 Gen_Health 3 Poor 31Dec2011 29Jan2012 . . . . 30Jan2012 28Feb2012 . . . .
2 28Nov2011 Physical_Act 1 10 30Oct2011 28Nov2011 . . . . 29Nov2011 28Dec2011 . . . .
2 28Dec2011 Physical_Act 2 15 29Nov2011 28Dec2011 . . . . 29Dec2011 27Jan2012 . . . .
2 27Jan2012 Physical_Act 3 5 29Dec2011 27Jan2012 . . . . 28Jan2012 26Feb2012 . . . .
2 01Dec2011 Diet 1 4 02Nov2011 01Dec2011 . . . . 02Dec2011 31Dec2011 . . . .
2 31Dec2011 Diet 2 5 02Dec2011 31Dec2011 . . . . 01Jan2012 30Jan2012 . . . .
2 30Jan2012 Diet 3 8 01Jan2012 30Jan2012 . . . . 31Jan2012 29Feb2012 . . . .
3 01Jan2012 Gen_Health 1 Poor 03Dec2011 01Jan2012 . . . . 02Jan2012 31Jan2012 . . . .
3 30Jan2012 Gen_Health 2 Good 01Jan2012 30Jan2012 . . . . 31Jan2012 29Feb2012 . . . .
3 03Mar2012 Gen_Health 3 Fair 03Feb2012 03Mar2012 . . . . 04Mar2012 02Apr2012 . . . .
3 06Jan2012 Physical_Act 1 2 08Dec2011 06Jan2012 . . . . 07Jan2012 05Feb2012 . . . .
3 04Feb2012 Physical_Act 2 4 06Jan2012 04Feb2012 . . . . 05Feb2012 05Mar2012 . . . .
3 08Mar2012 Physical_Act 3 4 08Feb2012 08Mar2012 . . . . 09Mar2012 07Apr2012 . . . .
3 25Dec2011 Diet 1 3 26Nov2011 25Dec2011 . . . . 26Dec2011 24Jan2012 . . . .
3 23Jan2012 Diet 2 7 25Dec2011 23Jan2012 . . . . 24Jan2012 22Feb2012 . . . .
3 25Feb2012 Diet 3 9 27Jan2012 25Feb2012 . . . . 26Feb2012 26Mar2012 . . . .
;
Run;
data survey; set survey; format date date9. Sleep_30dBF_date_start date9. Sleep_30dBF_date_end date9. Sleep_30dAF_date_start date9.
Sleep_30dAF_date_end date9.;run;
Thanks.
So, this gets a little crazy, but I believe the solution I give using CALL EXECUTE produces the information you want, and I leave it up to you to re-arrange it into whatever table is of use to you. I also leave it up to you to check the calculations for one or two data points.
proc delete data=all_stats;
run;
data _null_;
set survey;
thirty_days_before=date-30;
one_day_after=date+1;
thirty_days_after=date+30;
call execute(
"data new;
set sleep(where=(id=" || id || "));
survey_date=" || date || ";
if wake_date >= " || thirty_days_before || " and wake_date <= " || date || " then before_after=1;
else if wake_date >= " || one_day_after || " and wake_date <= " || thirty_days_after || " then before_after=2;
format survey_date date9.;
run;
proc summary nway data=new;
class before_after;
id id survey_date;
var sleep_dur_hrs sleep_start_mc_hrs;
output out=work._stats_ mean= std=/autoname;
run;
proc append base=all_stats new=work._stats_;
run;
");
run;
Are there simpler ways to do this? Quite possibly, but this is what I came up with.
Hi,
Thanks for this code. There was no way I was coming up with something like this by myself.
I am running into some issues:
1. The code has run for 5 hours or so, and only 18% of the dates are complete. Is it normal to expect it to take that long?
2. Data lines are repeating themselves several times. For example, as you can see in the screenshot - every odd row up until row 29 is identical; as is every even row up until row 30. The number of repeats vary, from twice to 8 times (that I can see). (Note that I changed two of the variable names).
3. The _FREQ_ for before_after=1 is sometimes 31, but each FREQ should be 30 (or less if there is missing data). This results in incorrect values for hours_asleep or start_clocktime. This is not always the case, though - sometimes the _FREQ_ is correctly 30.
The log is way too long for me to include the entire thing, but I have a sample below:
108318 + data new; set SLEEP(where=(id= 376)); survey_date= 22713; if wake_date >= 22683 and wake_date <= 22713 then before_after=1; else if wake_date >= 22714 108319 + and wake_date <= 22743 then before_after=2; format survey_date date9.; run; NOTE: There were 80 observations read from the data set SLEEP. WHERE id=376; NOTE: The data set WORK.NEW has 80 observations and 241 variables. NOTE: DATA statement used (Total process time): real time 0.09 seconds cpu time 0.09 seconds 108319 + proc summary nway data=new; class before_after; id id survey_date; var hours_asleep_w 108320 + start_clocktime_midcent_mean_w; output out=work._stats_ mean= std=/autoname; run; WARNING: A class or frequency variable is missing on every observation. NOTE: There were 80 observations read from the data set WORK.NEW. NOTE: The data set WORK._STATS_ has 0 observations and 9 variables. NOTE: PROCEDURE SUMMARY used (Total process time): real time 0.01 seconds cpu time 0.01 seconds 108320 + proc append base=all_stats new=work._stats_; run; NOTE: Appending WORK._STATS_ to WORK.ALL_STATS. NOTE: There were 0 observations read from the data set WORK._STATS_. NOTE: 0 observations added. NOTE: The data set WORK.ALL_STATS has 56934 observations and 9 variables. NOTE: PROCEDURE APPEND used (Total process time): real time 0.00 seconds cpu time 0.00 seconds 108321 + data new; set SLEEP(where=(id= 376)); survey_date= 22713; if wake_date >= 22683 and wake_date <= 22713 then before_after=1; else if wake_date >= 22714 108322 + and wake_date <= 22743 then before_after=2; format survey_date date9.; run; NOTE: There were 80 observations read from the data set SLEEP. WHERE id=376; NOTE: The data set WORK.NEW has 80 observations and 241 variables. NOTE: DATA statement used (Total process time): real time 0.10 seconds cpu time 0.11 seconds 108322 + proc summary nway data=new; class before_after; id id survey_date; var hours_asleep_w 108323 + start_clocktime_midcent_mean_w; output out=work._stats_ mean= std=/autoname; run; WARNING: A class or frequency variable is missing on every observation. NOTE: There were 80 observations read from the data set WORK.NEW. NOTE: The data set WORK._STATS_ has 0 observations and 9 variables. NOTE: PROCEDURE SUMMARY used (Total process time): real time 0.01 seconds cpu time 0.01 seconds 108323 + proc append base=all_stats new=work._stats_; run; NOTE: Appending WORK._STATS_ to WORK.ALL_STATS. NOTE: There were 0 observations read from the data set WORK._STATS_. NOTE: 0 observations added. NOTE: The data set WORK.ALL_STATS has 56934 observations and 9 variables. NOTE: PROCEDURE APPEND used (Total process time): real time 0.00 seconds cpu time 0.00 seconds 108324 + data new; set SLEEP(where=(id= 376)); survey_date= 22713; if wake_date >= 22683 and wake_date <= 22713 then before_after=1; else if wake_date >= 22714 108325 + and wake_date <= 22743 then before_after=2; format survey_date date9.; run; NOTE: There were 80 observations read from the data set SLEEP. WHERE id=376; NOTE: The data set WORK.NEW has 80 observations and 241 variables. NOTE: DATA statement used (Total process time): real time 0.10 seconds cpu time 0.09 seconds 108325 + proc summary nway data=new; class before_after; id id survey_date; var hours_asleep_w 108326 + start_clocktime_midcent_mean_w; output out=work._stats_ mean= std=/autoname; run; WARNING: A class or frequency variable is missing on every observation. NOTE: There were 80 observations read from the data set WORK.NEW. NOTE: The data set WORK._STATS_ has 0 observations and 9 variables. NOTE: PROCEDURE SUMMARY used (Total process time): real time 0.01 seconds cpu time 0.01 seconds 108326 + proc append base=all_stats new=work._stats_; run; NOTE: Appending WORK._STATS_ to WORK.ALL_STATS. NOTE: There were 0 observations read from the data set WORK._STATS_. NOTE: 0 observations added. NOTE: The data set WORK.ALL_STATS has 56934 observations and 9 variables. NOTE: PROCEDURE APPEND used (Total process time): real time 0.00 seconds cpu time 0.00 seconds 108327 + data new; set SLEEP(where=(id= 376)); survey_date= 22713; if wake_date >= 22683 and wake_date <= 22713 then before_after=1; else if wake_date >= 22714 108328 + and wake_date <= 22743 then before_after=2; format survey_date date9.; run; NOTE: There were 80 observations read from the data set SLEEP. WHERE id=376; NOTE: The data set WORK.NEW has 80 observations and 241 variables. NOTE: DATA statement used (Total process time): real time 0.11 seconds cpu time 0.11 seconds 108328 + proc summary nway data=new; class before_after; id id survey_date; var hours_asleep_w 108329 + start_clocktime_midcent_mean_w; output out=work._stats_ mean= std=/autoname; run; WARNING: A class or frequency variable is missing on every observation. NOTE: There were 80 observations read from the data set WORK.NEW. NOTE: The data set WORK._STATS_ has 0 observations and 9 variables. NOTE: PROCEDURE SUMMARY used (Total process time): real time 0.01 seconds cpu time 0.01 seconds 108329 + proc append base=all_stats new=work._stats_; run; NOTE: Appending WORK._STATS_ to WORK.ALL_STATS. NOTE: There were 0 observations read from the data set WORK._STATS_. NOTE: 0 observations added. NOTE: The data set WORK.ALL_STATS has 56934 observations and 9 variables. NOTE: PROCEDURE APPEND used (Total process time): real time 0.00 seconds cpu time 0.00 seconds 108330 + data new; set SLEEP(where=(id= 376)); survey_date= 22701; if wake_date >= 22671 and wake_date <= 22701 then before_after=1; else if wake_date >= 22702 108331 + and wake_date <= 22731 then before_after=2; format survey_date date9.; run; NOTE: There were 80 observations read from the data set SLEEP. WHERE id=376; NOTE: The data set WORK.NEW has 80 observations and 241 variables. NOTE: DATA statement used (Total process time): real time 0.10 seconds cpu time 0.09 seconds 108331 + proc summary nway data=new; class before_after; id id survey_date; var hours_asleep_w 108332 + start_clocktime_midcent_mean_w; output out=work._stats_ mean= std=/autoname; run; NOTE: There were 80 observations read from the data set WORK.NEW. NOTE: The data set WORK._STATS_ has 1 observations and 9 variables. NOTE: PROCEDURE SUMMARY used (Total process time): real time 0.01 seconds cpu time 0.01 seconds 108332 + proc append base=all_stats new=work._stats_; run; NOTE: Appending WORK._STATS_ to WORK.ALL_STATS. NOTE: There were 1 observations read from the data set WORK._STATS_. NOTE: 1 observations added. NOTE: The data set WORK.ALL_STATS has 56935 observations and 9 variables. NOTE: PROCEDURE APPEND used (Total process time): real time 0.00 seconds cpu time 0.00 seconds 108333 + data new; set SLEEP(where=(id= 376)); survey_date= 22701; if wake_date >= 22671 and wake_date <= 22701 then before_after=1; else if wake_date >= 22702 108334 + and wake_date <= 22731 then before_after=2; format survey_date date9.; run; NOTE: There were 80 observations read from the data set SLEEP. WHERE id=376; NOTE: The data set WORK.NEW has 80 observations and 241 variables. NOTE: DATA statement used (Total process time): real time 0.11 seconds cpu time 0.10 seconds 108334 + proc summary nway data=new; class before_after; id id survey_date; var hours_asleep_w 108335 + start_clocktime_midcent_mean_w; output out=work._stats_ mean= std=/autoname; run; NOTE: There were 80 observations read from the data set WORK.NEW. NOTE: The data set WORK._STATS_ has 1 observations and 9 variables. NOTE: PROCEDURE SUMMARY used (Total process time): real time 0.01 seconds cpu time 0.01 seconds 108335 + proc append base=all_stats new=work._stats_; run; NOTE: Appending WORK._STATS_ to WORK.ALL_STATS. NOTE: There were 1 observations read from the data set WORK._STATS_. NOTE: 1 observations added. NOTE: The data set WORK.ALL_STATS has 56936 observations and 9 variables. NOTE: PROCEDURE APPEND used (Total process time): real time 0.00 seconds cpu time 0.00 seconds 108336 + data new; set SLEEP(where=(id= 376)); survey_date= 22701; if wake_date >= 22671 and wake_date <= 22701 then before_after=1; else if wake_date >= 22702 108337 + and wake_date <= 22731 then before_after=2; format survey_date date9.; run; NOTE: There were 80 observations read from the data set SLEEP. WHERE id=376; NOTE: The data set WORK.NEW has 80 observations and 241 variables. NOTE: DATA statement used (Total process time): real time 0.08 seconds cpu time 0.09 seconds 108337 + proc summary nway data=new; class before_after; id id survey_date; var hours_asleep_w 108338 + start_clocktime_midcent_mean_w; output out=work._stats_ mean= std=/autoname; run; NOTE: There were 80 observations read from the data set WORK.NEW. NOTE: The data set WORK._STATS_ has 1 observations and 9 variables. NOTE: PROCEDURE SUMMARY used (Total process time): real time 0.01 seconds cpu time 0.00 seconds 108338 + proc append base=all_stats new=work._stats_; run; NOTE: Appending WORK._STATS_ to WORK.ALL_STATS. NOTE: There were 1 observations read from the data set WORK._STATS_. NOTE: 1 observations added. NOTE: The data set WORK.ALL_STATS has 56937 observations and 9 variables. NOTE: PROCEDURE APPEND used (Total process time): real time 0.00 seconds cpu time 0.00 seconds 108339 + data new; set SLEEP(where=(id= 376)); survey_date= 22710; if wake_date >= 22680 and wake_date <= 22710 then before_after=1; else if wake_date >= 22711 108340 + and wake_date <= 22740 then before_after=2; format survey_date date9.; run; NOTE: There were 80 observations read from the data set SLEEP. WHERE id=376; NOTE: The data set WORK.NEW has 80 observations and 241 variables. NOTE: DATA statement used (Total process time): real time 0.08 seconds cpu time 0.07 seconds 108340 + proc summary nway data=new; class before_after; id id survey_date; var hours_asleep_w 108341 + start_clocktime_midcent_mean_w; output out=work._stats_ mean= std=/autoname; run; WARNING: A class or frequency variable is missing on every observation. NOTE: There were 80 observations read from the data set WORK.NEW. NOTE: The data set WORK._STATS_ has 0 observations and 9 variables. NOTE: PROCEDURE SUMMARY used (Total process time): real time 0.01 seconds cpu time 0.01 seconds 108341 + proc append base=all_stats new=work._stats_; run; NOTE: Appending WORK._STATS_ to WORK.ALL_STATS. NOTE: There were 0 observations read from the data set WORK._STATS_. NOTE: 0 observations added. NOTE: The data set WORK.ALL_STATS has 56937 observations and 9 variables. NOTE: PROCEDURE APPEND used (Total process time): real time 0.00 seconds cpu time 0.00 seconds 108342 + data new; set SLEEP(where=(id= 376)); survey_date= 22710; if wake_date >= 22680 and wake_date <= 22710 then before_after=1; else if wake_date >= 22711 108343 + and wake_date <= 22740 then before_after=2; format survey_date date9.; run; NOTE: There were 80 observations read from the data set SLEEP. WHERE id=376; NOTE: The data set WORK.NEW has 80 observations and 241 variables. NOTE: DATA statement used (Total process time): real time 0.10 seconds cpu time 0.10 seconds 108343 + proc summary nway data=new; class before_after; id id survey_date; var hours_asleep_w 108344 + start_clocktime_midcent_mean_w; output out=work._stats_ mean= std=/autoname; run; WARNING: A class or frequency variable is missing on every observation. NOTE: There were 80 observations read from the data set WORK.NEW. NOTE: The data set WORK._STATS_ has 0 observations and 9 variables. NOTE: PROCEDURE SUMMARY used (Total process time): real time 0.01 seconds cpu time 0.01 seconds 108344 + proc append base=all_stats new=work._stats_; run; NOTE: Appending WORK._STATS_ to WORK.ALL_STATS. NOTE: There were 0 observations read from the data set WORK._STATS_. NOTE: 0 observations added. NOTE: The data set WORK.ALL_STATS has 56937 observations and 9 variables. NOTE: PROCEDURE APPEND used (Total process time): real time 0.00 seconds cpu time 0.00 seconds 108345 + data new; set SLEEP(where=(id= 376)); survey_date= 22710; if wake_date >= 22680 and wake_date <= 22710 then before_after=1; else if wake_date >= 22711 108346 + and wake_date <= 22740 then before_after=2; format survey_date date9.; run; NOTE: There were 80 observations read from the data set SLEEP. WHERE id=376; NOTE: The data set WORK.NEW has 80 observations and 241 variables. NOTE: DATA statement used (Total process time): real time 0.11 seconds cpu time 0.10 seconds 108346 + proc summary nway data=new; class before_after; id id survey_date; var hours_asleep_w 108347 + start_clocktime_midcent_mean_w; output out=work._stats_ mean= std=/autoname; run; WARNING: A class or frequency variable is missing on every observation. NOTE: There were 80 observations read from the data set WORK.NEW. NOTE: The data set WORK._STATS_ has 0 observations and 9 variables. NOTE: PROCEDURE SUMMARY used (Total process time): real time 0.01 seconds cpu time 0.01 seconds 108347 + proc append base=all_stats new=work._stats_; run; NOTE: Appending WORK._STATS_ to WORK.ALL_STATS. NOTE: There were 0 observations read from the data set WORK._STATS_. NOTE: 0 observations added. NOTE: The data set WORK.ALL_STATS has 56937 observations and 9 variables. NOTE: PROCEDURE APPEND used (Total process time): real time 0.00 seconds cpu time 0.00 seconds
I'd really appreciate any ideas on how to fix these errors. Thank you.
1. The code has run for 5 hours or so, and only 18% of the dates are complete. Is it normal to expect it to take that long?
How many rows in the sleep data set? How many rows in the survey data set? Are there missing values in either data set for the variables under discussion? How many distinct values of ID?
2. Data lines are repeating themselves several times. For example, as you can see in the screenshot - every odd row up until row 29 is identical; as is every even row up until row 30. The number of repeats vary, from twice to 8 times (that I can see). (Note that I changed two of the variable names).
Do you have surveys where there is no corresponding sleep? Or vice versa?
Also
NOTE: The data set WORK.NEW has 80 observations and 241 variables.
You didn't mention that data set SLEEP has 241 variables. Your example showed only a few variables.
Hi!
@PaigeMiller wrote:
1. The code has run for 5 hours or so, and only 18% of the dates are complete. Is it normal to expect it to take that long?
How many rows in the sleep data set? How many rows in the survey data set? Are there missing values in either data set for the variables under discussion? How many distinct values of ID?
- There are 327,275 rows in the survey dataset and 122505 in the sleep dataset. There was originally n=159 rows missing in the sleep dataset, which I just removed (now there are 122505 rows with data, so there were 122664 originally). There are 1036 unique IDs in the survey dataset and 957 unique IDs in the sleep dataset. It is possible there is some missingness. I just created new datasets with only the needed variables, so hopefully it will not take as long. I just left it overnight to run the code with the original datasets (with all the unneeded variables) for ~6.5 hours and it still hadn't completed.
2. Data lines are repeating themselves several times. For example, as you can see in the screenshot - every odd row up until row 29 is identical; as is every even row up until row 30. The number of repeats vary, from twice to 8 times (that I can see). (Note that I changed two of the variable names).
Do you have surveys where there is no corresponding sleep? Or vice versa?
- Yes, this is a possibility. Is there a way to remove duplicate rows in a dataset?
Also
NOTE: The data set WORK.NEW has 80 observations and 241 variables.
You didn't mention that data set SLEEP has 241 variables. Your example showed only a few variables.
- I only showed a sample on the forum; I suppose this could be why it's taking so long? I created new truncated datasets and will re-run the code.
Also, I'm still not sure why some _FREQ_ are 31 when the max should be 30.
Thanks!
This code probably will take a long time on 327,000 records in the survey data set. I have fixed the code I am using to extract more efficiency (less computer time needed), but I can't say how long it should take. This is code that doesn't scale well to hundreds of thousands of records. I have also "fixed" what I think caused the repeating problem.
proc delete data=all_stats;
run;
data sleep1(index=(id_date=(id wake_date)));
set sleep(keep=id wake_date sleep_dur_hrs sleep_start_mc_hrs);
run;
data _null_;
set survey;
thirty_days_before=date-30;
one_day_after=date+1;
thirty_days_after=date+30;
call execute(
"proc delete data=new _stats_; run;
data new;
set sleep1(where=(id=" || id || " and wake_date >= " || thirty_days_before || " and wake_date <= " || thirty_days_after || "));
survey_date=" || date || ";
if wake_date >= " || thirty_days_before || " and wake_date <= " || date || " then before_after=1;
else if wake_date >= " || one_day_after || " and wake_date <= " || thirty_days_after || " then before_after=2;
format survey_date date9.;
run;
proc summary nway data=new;
class before_after;
id id survey_date;
var sleep_dur_hrs sleep_start_mc_hrs;
output out=work._stats_ mean= std=/autoname;
run;
proc append base=all_stats new=work._stats_; run;"
);
run;
If that still isn't fast enough, there is another method in SAS called hash objects. I don't really know if hash objects will help in this case and in any event, I cannot write such code for you. I would recommend you start a new thread with this exact title: "Can hash objects do this merge/analysis?", this will allow experts in hash objects to give their opinions. In this new thread, describe the problem again, and also link to this thread.
Also, I'm still not sure why some _FREQ_ are 31 when the max should be 30.
That might be because of my code
thirty_days_before=date-30;
one_day_after=date+1;
thirty_days_after=date+30;
and you should fix this accordingly.
Hello,
I ran the second code you gave me, and I received an "out of memory" error and SAS shut down. I tried the code on another computer and the same thing happened. This code therefore requires more computing power than I currently have access to.
I instead just re-ran the first code you gave me. It took about 7-8 hours on my machine overnight, but it worked! I used nodup to remove the duplicates. Of course, I'd prefer a quicker solution that didn't require more memory than I have, but this will work for now.
Thank you!
Try removing the index= option.
When there are errors in the log, NEVER EVER describe the errors in your own words or quote a tiny portion of the log. Show us the log for the step that has the error. Please do this every time you have errors in the log from now on.
Since SAS shut down immediately after the error, it was impossible to copy the log . . . it literally just said "ERROR: OUT OF MEMORY," verbatim and then shut down right after. So, I'm not quite sure what I should do if I should never put the log into my own words.
I took away the index portion, and SAS still shut down, this time without any error message. Thanks for trying!
To tell you the truth, I am surprised it crashes when you remove the INDEX= option. What happens if you start a brand new SAS session and try the second SAS code without the INDEX= option?
My other suggestion is to ask in this forum about using hash objects in a new thread. Please follow the instructions I gave.
@PaigeMiller wrote:
My other suggestion is to ask in this forum about using hash objects in a new thread. Please follow the instructions I gave.
@confooseddesi89: I think this would be a very promising approach. A hash object could easily accommodate the relevant sleep data, which could then be retrieved (and written to temporary arrays with 30 elements) by a data step reading the survey data. The descriptive statistics would be computed using the MEAN and STD functions based on the arrays. So this would be a single data step with an estimated run time of several seconds, not hours.
I had tried that a couple of different times, and it still crashed. Just tried again, crashed again.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.