Solved: Re: Custom calculate monthly means and merge datasets by date

confooseddesi89 · Posted 01-30-2023 06:11 PM

Hello,

I have two datasets: One sleep dataset and one survey dataset. Each dataset is long form, with ID numbers repeating by row. Each ID has several months of data in each dataset.

For the sleep dataset, each row represents one night of sleep data, and each variable is represented in a separate column. See below an example of data for ID #1, with variables Sleep_dur_hrs and Sleep_start_MC_hrs (MC means midnight-centered) by wake_date:

user_id	wake_date	Sleep_dur_hrs	Sleep_start_MC_hrs
1	14Nov2011	7.52	-1.25
1	15Nov2011	5.43	-1.47
1	16Nov2011	8.40	-5.46
1	17Nov2011	7.52	-1.53
1	18Nov2011	7.70	-1.47
1	19Nov2011	7.60	-1.52
1	20Nov2011	7.40	-1.82
1	21Nov2011	8.23	-4.89
1	22Nov2011	6.75	-1.57
1	23Nov2011	7.20	-1.10
1	24Nov2011	4.15	2.62
1	25Nov2011	8.10	-1.38
1	26Nov2011	7.77	-1.49
1	27Nov2011	7.48	-1.07
1	28Nov2011	7.73	-.88
1	29Nov2011	7.03	.49
1	30Nov2011	7.45	-1.72
1	01Dec2011	7.62	-1.45
1	02Dec2011	6.62	-1.73
1	03Dec2011	7.67	-1.34
1	04Dec2011	6.92	-3.83
1	05Dec2011	7.27	-1.40
1	06Dec2011	6.40	-1.27
1	07Dec2011	8.60	-1.73
1	08Dec2011	7.18	-1.28
1	09Dec2011	7.18	-.52
1	10Dec2011	8.03	-1.62
1	11Dec2011	6.73	-.78
1	12Dec2011	8.43	-2.58
1	13Dec2011	7.65	-2.00
1	14Dec2011	5.92	-2.72
1	15Dec2011	9.12	-2.78
1	16Dec2011	7.50	-1.54
1	17Dec2011	7.77	-2.19
1	18Dec2011	6.82	-1.93
1	19Dec2011	7.77	-1.57
1	20Dec2011	7.50	-2.20
1	21Dec2011	8.77	-2.49
1	22Dec2011	7.87	-2.67
1	23Dec2011	6.28	-.02
1	24Dec2011	8.40	-1.27
1	25Dec2011	4.98	-1.18
1	26Dec2011	7.63	-1.78
1	27Dec2011	4.03	1.85
1	28Dec2011	7.57	-2.22
1	29Dec2011	2.70	3.27
1	30Dec2011	8.35	-5.27
1	31Dec2011	8.10	-1.17
1	01Jan2012	5.83	-.98
1	02Jan2012	6.93	-2.41
1	03Jan2012	6.38	-2.09

For the survey data, each row represents one survey entry and also a separate variable, with about one survey entry per month per person. Some variables were surveyed at different times of the month even for the same person (for example, Gen Health question was surveyed on Dec 2nd, but Physical Activity was surveyed on Dec 5th, for ID #1). See below an example of the data:

ID	date	Variable	entry_count	answer
1	02Dec2011	Gen_Health	1	Fair
1	01Jan2012	Gen_Health	2	Fair
1	01Feb2012	Gen_Health	3	Fair
1	05Dec2011	Physical_Act	1	10
1	04Jan2012	Physical_Act	2	14
1	04Feb2012	Physical_Act	3	30
1	28Nov2011	Diet	1	4
1	28Dec2011	Diet	2	7
1	28Jan2012	Diet	3	30
2	30Nov2011	Gen_Health	1	Good
2	30Dec2011	Gen_Health	2	Fair
2	29Jan2012	Gen_Health	3	Poor
2	28Nov2011	Physical_Act	1	10
2	28Dec2011	Physical_Act	2	15
2	27Jan2012	Physical_Act	3	5
2	01Dec2011	Diet	1	4
2	31Dec2011	Diet	2	5
2	30Jan2012	Diet	3	8
3	01Jan2012	Gen_Health	1	Poor
3	30Jan2012	Gen_Health	2	Good
3	03Mar2012	Gen_Health	3	Fair
3	06Jan2012	Physical_Act	1	2
3	04Feb2012	Physical_Act	2	4
3	08Mar2012	Physical_Act	3	4
3	25Dec2011	Diet	1	3
3	23Jan2012	Diet	2	7
3	25Feb2012	Diet	3	9

I've given an example of data for 3 IDs here, with 3 different variables repeated across 3 survey administrations ("entry_count"), with the date of survey question administration ("date"), and the answer to that variable ("answer").

What I'd like to obtain is sleep data (mean/SD of sleep duration and sleep start) for 30 days before and 30 days after the date of each unique survey administration per ID. I'd like to merge this sleep data with the survey data. Below are sample data. Note that I've only filled out the first two rows for the first section and the first row for the second section. Also note that 30dBF means 30 days before; and 30dAF means 30 days after. Before the actual sleep variables themselves, I should have variables representing the start date of the "before" interval (29 days before the survey question administration), the end date of the "before" interval (the date of the survey administration); the start date of the "after" interval (1 day after the survey question administration), and the end date of the "after" interval (30 days after the date of the survey administration);

ID

date

Variable

entry_count

answer

SleepBF_date_start

SleepBF_date_end

Sleep_dur_hrs_avgBF

Sleep_dur_hrs_stdBF

Sleep_start_MC_hrs_avgBF

Sleep_start_MC_hrs_stdBF

SleepAF_date_start

SleepAF_date_end

Sleep_dur_hrs_avgAF

Sleep_dur_hrs_stdAF

Sleep_start_MC_hrs_avgAF

Sleep_start_MC_hrs_stdAF

1

02Dec2011

Gen_Health

1

Fair

03Nov2011

02Dec2011

7.25

1.00

-1.51

1.65

03Dec2011

01Jan2012

7.19

1.38

-1.60

1.51

1

01Jan2012

Gen_Health

2

Fair

03Dec2011

01Jan2012

7.17

1.3956345

-1.57

1.513422712

02Jan2012

31Jan2012

1

01Feb2012

Gen_Health

3

Fair

03Jan2012

01Feb2012

02Feb2012

02Mar2012

1

05Dec2011

Physical_Act

1

10

06Nov2011

05Dec2011

06Dec2011

04Jan2012

1

04Jan2012

Physical_Act

2

14

06Dec2011

04Jan2012

05Jan2012

03Feb2012

1

04Feb2012

Physical_Act

3

30

06Jan2012

04Feb2012

05Feb2012

05Mar2012

1

28Nov2011

Diet

1

4

30Oct2011

28Nov2011

29Nov2011

28Dec2011

1

28Dec2011

Diet

2

7

29Nov2011

28Dec2011

29Dec2011

27Jan2012

1

28Jan2012

Diet

3

30

30Dec2011

28Jan2012

29Jan2012

27Feb2012

2

30Nov2011

Gen_Health

1

Good

01Nov2011

30Nov2011

01Dec2011

30Dec2011

2

30Dec2011

Gen_Health

2

Fair

01Dec2011

30Dec2011

31Dec2011

29Jan2012

2

29Jan2012

Gen_Health

3

Poor

31Dec2011

29Jan2012

30Jan2012

28Feb2012

2

28Nov2011

Physical_Act

1

10

30Oct2011

28Nov2011

29Nov2011

28Dec2011

2

28Dec2011

Physical_Act

2

15

29Nov2011

28Dec2011

29Dec2011

27Jan2012

2

27Jan2012

Physical_Act

3

5

29Dec2011

27Jan2012

28Jan2012

26Feb2012

2

01Dec2011

Diet

1

4

02Nov2011

01Dec2011

02Dec2011

31Dec2011

2

31Dec2011

Diet

2

5

02Dec2011

31Dec2011

01Jan2012

30Jan2012

2

30Jan2012

Diet

3

8

01Jan2012

30Jan2012

31Jan2012

29Feb2012

3

01Jan2012

Gen_Health

1

Poor

03Dec2011

01Jan2012

02Jan2012

31Jan2012

3

30Jan2012

Gen_Health

2

Good

01Jan2012

30Jan2012

31Jan2012

29Feb2012

3

03Mar2012

Gen_Health

3

Fair

03Feb2012

03Mar2012

04Mar2012

02Apr2012

3

06Jan2012

Physical_Act

1

2

08Dec2011

06Jan2012

07Jan2012

05Feb2012

3

04Feb2012

Physical_Act

2

4

06Jan2012

04Feb2012

05Feb2012

05Mar2012

3

08Mar2012

Physical_Act

3

4

08Feb2012

08Mar2012

09Mar2012

07Apr2012

3

25Dec2011

Diet

1

3

26Nov2011

25Dec2011

26Dec2011

24Jan2012

3

23Jan2012

Diet

2

7

25Dec2011

23Jan2012

24Jan2012

22Feb2012

3

25Feb2012

Diet

3

9

27Jan2012

25Feb2012

26Feb2012

26Mar2012

This is way beyond any of my available SAS coding knowledge, and I'm really in need of assistance for some analyses that are due soon. Can anyone help? Thanks.

PaigeMiller · Posted 01-31-2023 02:59 PM

So, this gets a little crazy, but I believe the solution I give using CALL EXECUTE produces the information you want, and I leave it up to you to re-arrange it into whatever table is of use to you. I also leave it up to you to check the calculations for one or two data points.

proc delete data=all_stats;
run;
data _null_;
    set survey;
    thirty_days_before=date-30;
    one_day_after=date+1;
    thirty_days_after=date+30;
    call execute(
        "data new;
            set sleep(where=(id=" || id || ")); 
            survey_date=" || date || ";
            if wake_date >= " || thirty_days_before || " and wake_date <= " || date || " then before_after=1; 
            else if wake_date >= " || one_day_after || " and wake_date <= " || thirty_days_after || " then before_after=2;
            format survey_date date9.; 
         run;
         proc summary nway data=new;
             class before_after;
             id id survey_date;
             var sleep_dur_hrs sleep_start_mc_hrs;
             output out=work._stats_ mean= std=/autoname;
         run;
         proc append base=all_stats new=work._stats_;
         run;
        ");
run;

Are there simpler ways to do this? Quite possibly, but this is what I came up with.

--
Paige Miller

View solution in original post

PaigeMiller · Posted 01-31-2023 07:31 AM

Hello @confooseddesi89

Can you please turn these data screen captures into working SAS data step code (instructions)? We can't work with the data in screen captures.

--
Paige Miller

confooseddesi89 · Posted 01-31-2023 02:18 PM

Here is the data step code for the sleep data (sample data):

Data sleep;
Input id wake_date date12. Sleep_dur_hrs Sleep_start_MC_hrs;
datalines;
1	14Nov2011	7.5167	-1.25
1	15Nov2011	5.4333	-1.46667
1	16Nov2011	8.4	-5.45833
1	17Nov2011	7.5167	-1.53333
1	18Nov2011	7.7	-1.46667
1	19Nov2011	7.6	-1.51667
1	20Nov2011	7.4	-1.81667
1	21Nov2011	8.2333	-4.89167
1	22Nov2011	6.75	-1.56667
1	23Nov2011	7.2	-1.1
1	24Nov2011	4.15	2.61667
1	25Nov2011	8.1	-1.38333
1	26Nov2011	7.7667	-1.49167
1	27Nov2011	7.4833	-1.06667
1	28Nov2011	7.7333	-0.88333
1	29Nov2011	7.0333	0.49167
1	30Nov2011	7.45	-1.71667
1	01Dec2011	7.6167	-1.45
1	02Dec2011	6.6167	-1.73333
1	03Dec2011	7.6667	-1.34167
1	04Dec2011	6.9167	-3.825
1	05Dec2011	7.2667	-1.4
1	06Dec2011	6.4	-1.26667
1	07Dec2011	8.6	-1.725
1	08Dec2011	7.1833	-1.275
1	09Dec2011	7.1833	-0.51667
1	10Dec2011	8.0333	-1.61667
1	11Dec2011	6.7333	-0.78333
1	12Dec2011	8.4333	-2.58333
1	13Dec2011	7.65	-2
1	14Dec2011	5.9167	-2.71667
1	15Dec2011	9.1167	-2.775
1	16Dec2011	7.5	-1.54167
1	17Dec2011	7.7667	-2.19167
1	18Dec2011	6.8167	-1.93333
1	19Dec2011	7.7667	-1.56667
1	20Dec2011	7.5	-2.2
1	21Dec2011	8.7667	-2.49167
1	22Dec2011	7.8667	-2.66667
1	23Dec2011	6.2833	-0.01667
1	24Dec2011	8.4	-1.26667
1	25Dec2011	4.9833	-1.18333
1	26Dec2011	7.6333	-1.775
1	27Dec2011	4.0333	1.85
1	28Dec2011	7.5667	-2.21667
1	29Dec2011	2.7	3.26667
1	30Dec2011	8.35	-5.26667
1	31Dec2011	8.1	-1.16667
1	01Jan2012	5.8333	-0.98333
1	02Jan2012	6.9333	-2.40833
1	03Jan2012	6.3833	-2.09444
1	04Jan2012	8.4667	-2.61667
1	05Jan2012	7.45	-1.21667
1	06Jan2012	7.5167	-0.76667
1	07Jan2012	7.0667	-1.15
1	08Jan2012	8.45	-2.43333
1	09Jan2012	7.4667	-0.91667
1	10Jan2012	7.3	-1.03333
1	11Jan2012	8.6667	-2.76667
1	12Jan2012	8.1	-0.33333
1	13Jan2012	5.3333	1.3
1	14Jan2012	7.9333	-1.1
1	15Jan2012	6.35	-3.01389
1	16Jan2012	7.4333	-1.03333
1	17Jan2012	7.3833	-1.55
1	18Jan2012	8.5	-1.35
1	19Jan2012	4.7333	-0.15
1	20Jan2012	7.3667	-1.28333
1	21Jan2012	8.5	-1.36667
1	22Jan2012	7.9	-0.76667
1	23Jan2012	7.8	-0.48333
1	24Jan2012	8.1667	-1.1
1	25Jan2012	7.95	-1.45
1	26Jan2012	7.7333	-1.25
1	27Jan2012	7.3833	-1.05833
1	28Jan2012	7.95	-0.88333
1	29Jan2012	7.3833	-0.98333
1	30Jan2012	5.4833	2.0125
1	31Jan2012	6.6167	-0.13333
1	01Feb2012	7.0667	1.40417
1	02Feb2012	6.7	-0.55833
1	03Feb2012	9.1333	-1.575
1	04Feb2012	7.5167	-1.69167
1	05Feb2012	7.7833	-1.98333
1	06Feb2012	9.65	-1.875
1	07Feb2012	4.0833	0.9
1	08Feb2012	7.2333	-1.58333
1	09Feb2012	7.9167	-0.85
1	10Feb2012	8.1333	-1.65
1	11Feb2012	8.0333	-2.36667
1	12Feb2012	6.8667	-0.66667
1	13Feb2012	7.75	-2.04167
1	14Feb2012	5.6833	-2.10833
1	15Feb2012	5.65	-0.23333
1	16Feb2012	7.9667	-0.53333
1	17Feb2012	7.7167	-1.38333
1	18Feb2012	8.45	-1.55
1	19Feb2012	6.2333	1.78333
1	20Feb2012	8.2167	-0.93333
1	21Feb2012	8.3167	-1.75
1	22Feb2012	8.6	-2.3
1	23Feb2012	8.1667	-1.51667
1	24Feb2012	7.7833	-1.1
1	25Feb2012	7.6167	-0.85
1	26Feb2012	6.6	-1.45
1	27Feb2012	6.1333	-1.13333
1	28Feb2012	8.55	-3.7
1	29Feb2012	8.0667	-2.39167
1	01Mar2012	8.15	-1.23333
1	02Mar2012	8.75	-1.41667
1	03Mar2012	8.1833	-0.79167
1	04Mar2012	8.3333	-1.75833
1	05Mar2012	7.7333	-1.16667
1	06Mar2012	6.3333	0.05
1	07Mar2012	7.9167	-1.51667
1	08Mar2012	7.2	-1.25833
1	09Mar2012	6	2.25833
1	10Mar2012	8.1667	-1.175
1	11Mar2012	6.5833	-0.68333
1	12Mar2012	6.5667	1.575
1	13Mar2012	7.65	-0.25
1	14Mar2012	7.1333	-0.24167
1	15Mar2012	7.3167	-2.475
1	16Mar2012	8.55	-2
1	17Mar2012	6.2333	-0.28333
1	18Mar2012	8.1667	-1.26667
1	19Mar2012	8.7333	-1.68333
1	20Mar2012	7.8167	-1.15833
1	21Mar2012	7.7167	-0.73333
1	22Mar2012	7.7167	-0.43333
1	23Mar2012	7.85	-0.48333
1	24Mar2012	3.9	0.71667
1	25Mar2012	7.8167	-2.1
1	26Mar2012	7.5333	1.54167
1	27Mar2012	7.9167	-1.05
1	28Mar2012	8.8667	-1.35
1	29Mar2012	9.3667	-0.58333
1	30Mar2012	5.1	1.875
1	31Mar2012	7.35	-0.33333
1	01Apr2012	6.5333	-1.26667
1	02Apr2012	6.7	-1.13333
1	03Apr2012	9	-0.53333
1	04Apr2012	7.2	-1.81667
1	05Apr2012	8.1833	3.73333
1	06Apr2012	8.65	-1.23333
1	07Apr2012	7.1667	0.44167
1	08Apr2012	7.5833	-1.21667
2	14Nov2011	7.45	-2.23333
2	15Nov2011	7.3	-2.11667
2	16Nov2011	6.0833	-1.7
2	17Nov2011	6.3333	-2.35
2	18Nov2011	5.3167	-1.14167
2	19Nov2011	6.9	-1.025
2	20Nov2011	6.25	-1.50833
2	21Nov2011	5.2667	1.675
2	22Nov2011	5.9667	-1.775
2	23Nov2011	5.9	-2.98333
2	24Nov2011	6.1167	0.33333
2	25Nov2011	5.3667	1.55833
2	26Nov2011	8.4667	-3.2
2	27Nov2011	5.5667	-1.5
2	28Nov2011	7.05	1.61667
2	29Nov2011	6.1667	-0.65833
2	30Nov2011	5.95	2.0375
2	01Dec2011	6.15	-0.98333
2	02Dec2011	4.6833	-0.55833
2	03Dec2011	3.7167	-0.45
2	04Dec2011	6.1333	-0.7
2	05Dec2011	4.9833	-0.45
2	06Dec2011	7.8833	-1.38333
2	07Dec2011	6.2	0.95
2	08Dec2011	6.4167	0.18333
2	09Dec2011	6.3167	1.25
2	10Dec2011	4.6167	0.86667
2	11Dec2011	3.65	2.56667
2	12Dec2011	7.15	-0.21667
2	13Dec2011	6.8	0.2
2	14Dec2011	2.9667	4.40833
2	15Dec2011	5.5167	2.3
2	16Dec2011	8.25	-0.89167
2	17Dec2011	3.55	1.8
2	18Dec2011	7.0167	0.25833
2	19Dec2011	5.8	2.84583
2	20Dec2011	6.1333	-2.51667
2	21Dec2011	5.2333	-1.425
2	22Dec2011	7.4833	-0.825
2	23Dec2011	6.3667	0.24167
2	24Dec2011	4.6667	-0.35
2	25Dec2011	6.4333	-0.55
2	26Dec2011	6.6667	-0.93333
2	27Dec2011	6.3	-0.88333
2	28Dec2011	5.15	-0.025
2	29Dec2011	4.6833	2.67083
2	30Dec2011	8.1	-2.4
2	31Dec2011	4.4333	-3.80417
2	01Jan2012	7.05	-1.28333
2	02Jan2012	6.7	-0.29167
2	03Jan2012	5.6833	3.17083
2	04Jan2012	6.55	2.34583
2	05Jan2012	6.7667	-0.33056
2	06Jan2012	6.0667	0.64167
2	07Jan2012	5.9667	-0.85833
2	08Jan2012	7.3167	0.175
2	09Jan2012	5.2667	-0.61667
2	10Jan2012	5.7667	1.37083
2	11Jan2012	5.8833	3.27917
2	12Jan2012	4.7167	1.23333
2	13Jan2012	7.4333	1.16667
2	14Jan2012	3.95	-2.275
2	15Jan2012	7.35	-1.25
2	16Jan2012	6.3333	-0.10833
2	17Jan2012	3.9333	0.525
2	18Jan2012	2.9167	4.25
2	19Jan2012	6.15	1.21667
2	20Jan2012	7.2	0.58333
2	21Jan2012	4.5833	1.01667
2	22Jan2012	7.1	0.24167
2	23Jan2012	5.8667	2.40417
2	24Jan2012	6.1667	2.2
2	25Jan2012	6.5	0.95
2	26Jan2012	6.1833	1.19167
2	27Jan2012	7.55	-1.075
2	28Jan2012	4.8833	-0.19167
2	29Jan2012	8.5833	-0.76667
2	30Jan2012	4.95	2.44167
2	31Jan2012	4.05	0.75
2	01Feb2012	8.55	-2.2
2	02Feb2012	7.5	-0.14167
2	03Feb2012	5.55	2.54167
2	04Feb2012	4.8333	0
2	05Feb2012	5.6833	0.38333
2	06Feb2012	6.45	-4.4875
2	07Feb2012	4.8	0.93333
2	08Feb2012	5.8167	-0.86667
2	09Feb2012	6.15	-0.21667
2	10Feb2012	5.3	-0.85
2	11Feb2012	4.8667	-5.90417
2	12Feb2012	7.5167	-1.525
2	13Feb2012	7.2333	0.08333
2	14Feb2012	4.85	3.675
2	15Feb2012	5.9	0.45833
2	16Feb2012	7.45	0.275
2	17Feb2012	4.0333	0.80833
2	18Feb2012	4.5833	0.575
2	19Feb2012	4.45	2.7
2	20Feb2012	7.3333	0.55
2	21Feb2012	5.35	1.25
2	22Feb2012	4.35	-1.025
2	23Feb2012	4.4	-2.66667
2	24Feb2012	8.4667	-2.64167
2	25Feb2012	6.3667	-3.88333
2	26Feb2012	6.5667	1.3625
2	27Feb2012	8.2	-0.59167
2	28Feb2012	5.6333	1.31667
2	29Feb2012	7.2	-0.2
2	01Mar2012	5.5667	-0.26667
2	02Mar2012	4.5667	-0.83333
2	03Mar2012	6.4333	-4.19167
2	04Mar2012	5.6667	0.24167
2	05Mar2012	4.4833	2.71667
2	06Mar2012	5.1833	4.12083
2	07Mar2012	5.8	1.225
2	08Mar2012	5.65	0.71667
2	09Mar2012	8.1167	-0.83333
2	10Mar2012	4.2333	0.56667
2	12Mar2012	6.8667	1.2
2	13Mar2012	7.9833	-0.91667
2	14Mar2012	4.05	3.57917
2	15Mar2012	1.6333	2.53333
2	16Mar2012	6.95	0.89167
2	17Mar2012	5.35	0.11667
2	18Mar2012	6.2333	1.79167
2	19Mar2012	3.7667	2.71667
2	20Mar2012	8.5167	1.685
2	21Mar2012	5.4333	2.45
2	22Mar2012	6.5167	0.425
2	23Mar2012	8.45	-0.73333
2	24Mar2012	7.85	-1.78333
2	25Mar2012	6.9	-0.11667
2	26Mar2012	7.3667	-0.61667
2	27Mar2012	8.6667	-1.64167
2	28Mar2012	7.7333	0.26667
2	29Mar2012	4.2333	-0.16667
2	30Mar2012	8.2	-1.38333
2	31Mar2012	5.85	0.80833
2	01Apr2012	6.1	1.85833
2	02Apr2012	6.6333	0.75
2	03Apr2012	4.5833	1.51667
2	04Apr2012	5.4333	3
2	05Apr2012	7.5167	1.12083
2	06Apr2012	3.7667	-0.6
2	07Apr2012	6.2833	1.49167
2	08Apr2012	7.35	0.33333
3	15Nov2011	7.5667	-2.36667
3	16Nov2011	7.6167	-2.89167
3	17Nov2011	7.5	-2.00833
3	18Nov2011	8.85	-3.71667
3	19Nov2011	8.7167	-3.425
3	20Nov2011	7.5667	-2.35833
3	03Dec2011	8.65	-1.16944
3	04Dec2011	6.0167	-0.00833
3	05Dec2011	8.4833	-2.74167
3	06Dec2011	9.1	-3.39167
3	08Dec2011	7.8667	-2.25
3	09Dec2011	8.4333	-2.63333
3	10Dec2011	8.5167	-3.03333
3	07Jan2012	2.5833	-3.71667
3	08Jan2012	4.7667	0.69167
3	10Jan2012	8.7333	-3.59167
3	11Jan2012	7.7	-2.69167
3	12Jan2012	6.3333	-1.875
3	13Jan2012	8.7	-2.025
3	15Jan2012	3.1667	-1.6375
3	16Jan2012	1.3667	-6.23333
3	19Jan2012	10.4167	-3.21667
3	20Jan2012	2.1333	-5.25833
3	21Jan2012	1.4833	-7.125
3	24Jan2012	1.85	-5.08333
3	25Jan2012	1.3667	7.125
3	27Jan2012	1.85	-3.125
3	28Jan2012	2.5	-7.125
3	29Jan2012	8.0833	-2.26667
3	31Jan2012	1.4167	-5.475
3	01Feb2012	1.5167	-3.1
3	02Feb2012	1.3667	-7.125
3	03Feb2012	1.3667	-4.65
3	04Feb2012	2.6167	-7.125
3	05Feb2012	1.5667	-4.98333
3	09Feb2012	9.95	-5.0375
3	10Feb2012	1.95	-7.125
3	11Feb2012	9.8	-3.86667
3	12Feb2012	7.4333	-2.025
3	13Feb2012	8.1	-1.68333
3	14Feb2012	11.05	-6.5125
3	17Feb2012	9.9833	-1.05
3	19Feb2012	10	-4.54167
3	21Feb2012	9.9667	-5.39167
3	22Feb2012	7.3667	-1.58333
3	23Feb2012	8.7667	-2.80833
3	24Feb2012	7.9833	-2.71667
3	25Feb2012	10.0833	-3.73333
3	26Feb2012	7.4	-1.71667
3	27Feb2012	11.0667	-5.45417
3	28Feb2012	7.1333	-2.125
3	29Feb2012	6.3667	-2.075
3	02Mar2012	9.15	-3.01667
3	03Mar2012	7	-1.425
3	04Mar2012	7.8333	-3.19583
3	05Mar2012	10.3333	-4.26667
3	07Mar2012	7.1667	-1.63333
3	08Mar2012	8.0167	-3.96667
3	09Mar2012	5.5333	-3.39167
3	10Mar2012	1.3667	-7.125
3	12Mar2012	9.8333	-5
3	13Mar2012	6.8333	-1.375
3	14Mar2012	8.8333	-3.4
3	15Mar2012	7.7333	-2.28333
3	16Mar2012	6.9	-2.26667
3	17Mar2012	9.45	-4.54167
3	25Mar2012	8.7	-2.20833
3	26Mar2012	7.5833	-1.26667
3	27Mar2012	6.7167	-1.59167
3	28Mar2012	8.4833	-4.3
3	29Mar2012	7.05	-2.375
3	30Mar2012	8.4333	-4.7875
3	31Mar2012	1.3667	6.64167
3	01Apr2012	7.05	-1.99167
3	05Apr2012	9.9	-5.24167
3	06Apr2012	8.25	-3
3	08Apr2012	8.5167	-3.13333
;
Run;
data sleep; set sleep; format wake_date date9.;run;

Here is the data step code for the survey data (sample data):

Data survey;
Input ID date date12. Variable: $14. entry_count answer $ Sleep_30dBF_date_start date12. Sleep_30dBF_date_end date12. Sleep_dur_hrs_avg_30dBF Sleep_dur_hrs_std_30dBF Sleep_start_MC_hrs_avg_30dBF Sleep_start_MC_hrs_std_30dBF Sleep_30dAF_date_start date12. Sleep_30dAF_date_end date12. Sleep_dur_hrs_avg_30dAF Sleep_dur_hrs_std_30dAF Sleep_start_MC_hrs_avg_30dAF Sleep_start_MC_hrs_std_30dAF;
datalines;
1	02Dec2011	Gen_Health	1	Fair	03Nov2011	02Dec2011	7.25	1.00	-1.51	1.65	03Dec2011	01Jan2012	7.19	1.38	-1.60	1.51
1	01Jan2012	Gen_Health	2	Fair	03Dec2011	01Jan2012	7.17	1.3956345	-1.57	1.513422712	02Jan2012	31Jan2012	.	.	.	.
1	01Feb2012	Gen_Health	3	Fair	03Jan2012	01Feb2012	.	.	.	.	02Feb2012	02Mar2012	.	.	.	.
1	05Dec2011	Physical_Act	1	10	06Nov2011	05Dec2011	.	.	.	.	06Dec2011	04Jan2012	.	.	.	.
1	04Jan2012	Physical_Act	2	14	06Dec2011	04Jan2012	.	.	.	.	05Jan2012	03Feb2012	.	.	.	.
1	04Feb2012	Physical_Act	3	30	06Jan2012	04Feb2012	.	.	.	.	05Feb2012	05Mar2012	.	.	.	.
1	28Nov2011	Diet	1	4	30Oct2011	28Nov2011	.	.	.	.	29Nov2011	28Dec2011	.	.	.	.
1	28Dec2011	Diet	2	7	29Nov2011	28Dec2011	.	.	.	.	29Dec2011	27Jan2012	.	.	.	.
1	28Jan2012	Diet	3	30	30Dec2011	28Jan2012	.	.	.	.	29Jan2012	27Feb2012	.	.	.	.
2	30Nov2011	Gen_Health	1	Good	01Nov2011	30Nov2011	.	.	.	.	01Dec2011	30Dec2011	.	.	.	.
2	30Dec2011	Gen_Health	2	Fair	01Dec2011	30Dec2011	.	.	.	.	31Dec2011	29Jan2012	.	.	.	.
2	29Jan2012	Gen_Health	3	Poor	31Dec2011	29Jan2012	.	.	.	.	30Jan2012	28Feb2012	.	.	.	.
2	28Nov2011	Physical_Act	1	10	30Oct2011	28Nov2011	.	.	.	.	29Nov2011	28Dec2011	.	.	.	.
2	28Dec2011	Physical_Act	2	15	29Nov2011	28Dec2011	.	.	.	.	29Dec2011	27Jan2012	.	.	.	.
2	27Jan2012	Physical_Act	3	5	29Dec2011	27Jan2012	.	.	.	.	28Jan2012	26Feb2012	.	.	.	.
2	01Dec2011	Diet	1	4	02Nov2011	01Dec2011	.	.	.	.	02Dec2011	31Dec2011	.	.	.	.
2	31Dec2011	Diet	2	5	02Dec2011	31Dec2011	.	.	.	.	01Jan2012	30Jan2012	.	.	.	.
2	30Jan2012	Diet	3	8	01Jan2012	30Jan2012	.	.	.	.	31Jan2012	29Feb2012	.	.	.	.
3	01Jan2012	Gen_Health	1	Poor	03Dec2011	01Jan2012	.	.	.	.	02Jan2012	31Jan2012	.	.	.	.
3	30Jan2012	Gen_Health	2	Good	01Jan2012	30Jan2012	.	.	.	.	31Jan2012	29Feb2012	.	.	.	.
3	03Mar2012	Gen_Health	3	Fair	03Feb2012	03Mar2012	.	.	.	.	04Mar2012	02Apr2012	.	.	.	.
3	06Jan2012	Physical_Act	1	2	08Dec2011	06Jan2012	.	.	.	.	07Jan2012	05Feb2012	.	.	.	.
3	04Feb2012	Physical_Act	2	4	06Jan2012	04Feb2012	.	.	.	.	05Feb2012	05Mar2012	.	.	.	.
3	08Mar2012	Physical_Act	3	4	08Feb2012	08Mar2012	.	.	.	.	09Mar2012	07Apr2012	.	.	.	.
3	25Dec2011	Diet	1	3	26Nov2011	25Dec2011	.	.	.	.	26Dec2011	24Jan2012	.	.	.	.
3	23Jan2012	Diet	2	7	25Dec2011	23Jan2012	.	.	.	.	24Jan2012	22Feb2012	.	.	.	.
3	25Feb2012	Diet	3	9	27Jan2012	25Feb2012	.	.	.	.	26Feb2012	26Mar2012	.	.	.	.
;
Run;
data survey; set survey; format date date9. Sleep_30dBF_date_start date9. Sleep_30dBF_date_end date9.  Sleep_30dAF_date_start date9. 
Sleep_30dAF_date_end date9.;run;

Thanks.

PaigeMiller · Posted 01-31-2023 02:59 PM

So, this gets a little crazy, but I believe the solution I give using CALL EXECUTE produces the information you want, and I leave it up to you to re-arrange it into whatever table is of use to you. I also leave it up to you to check the calculations for one or two data points.

proc delete data=all_stats;
run;
data _null_;
    set survey;
    thirty_days_before=date-30;
    one_day_after=date+1;
    thirty_days_after=date+30;
    call execute(
        "data new;
            set sleep(where=(id=" || id || ")); 
            survey_date=" || date || ";
            if wake_date >= " || thirty_days_before || " and wake_date <= " || date || " then before_after=1; 
            else if wake_date >= " || one_day_after || " and wake_date <= " || thirty_days_after || " then before_after=2;
            format survey_date date9.; 
         run;
         proc summary nway data=new;
             class before_after;
             id id survey_date;
             var sleep_dur_hrs sleep_start_mc_hrs;
             output out=work._stats_ mean= std=/autoname;
         run;
         proc append base=all_stats new=work._stats_;
         run;
        ");
run;

Are there simpler ways to do this? Quite possibly, but this is what I came up with.

--
Paige Miller

confooseddesi89 · Posted 01-31-2023 09:58 PM

Hi,

Thanks for this code. There was no way I was coming up with something like this by myself.

I am running into some issues:

1. The code has run for 5 hours or so, and only 18% of the dates are complete. Is it normal to expect it to take that long?

2. Data lines are repeating themselves several times. For example, as you can see in the screenshot - every odd row up until row 29 is identical; as is every even row up until row 30. The number of repeats vary, from twice to 8 times (that I can see). (Note that I changed two of the variable names).

3. The _FREQ_ for before_after=1 is sometimes 31, but each FREQ should be 30 (or less if there is missing data). This results in incorrect values for hours_asleep or start_clocktime. This is not always the case, though - sometimes the _FREQ_ is correctly 30.

The log is way too long for me to include the entire thing, but I have a sample below:

108318 + data new;            set SLEEP(where=(id=         376));
       survey_date=       22713;            if wake_date >=        22683 and wake_date <=
22713 then before_after=1;            else if wake_date >=        22714
108319 + and wake_date <=        22743 then before_after=2;            format survey_date
date9.;         run;

NOTE: There were 80 observations read from the data set SLEEP.
      WHERE id=376;
NOTE: The data set WORK.NEW has 80 observations and 241 variables.
NOTE: DATA statement used (Total process time):
      real time           0.09 seconds
      cpu time            0.09 seconds


108319 +
proc summary nway data=new;             class before_after;             id id survey_date;
       var hours_asleep_w
108320 + start_clocktime_midcent_mean_w;             output out=work._stats_ mean=
std=/autoname;         run;

WARNING: A class or frequency variable is missing on every observation.
NOTE: There were 80 observations read from the data set WORK.NEW.
NOTE: The data set WORK._STATS_ has 0 observations and 9 variables.
NOTE: PROCEDURE SUMMARY used (Total process time):
      real time           0.01 seconds
      cpu time            0.01 seconds


108320 +
proc append base=all_stats new=work._stats_;         run;

NOTE: Appending WORK._STATS_ to WORK.ALL_STATS.
NOTE: There were 0 observations read from the data set WORK._STATS_.
NOTE: 0 observations added.
NOTE: The data set WORK.ALL_STATS has 56934 observations and 9 variables.
NOTE: PROCEDURE APPEND used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds


108321 + data new;            set SLEEP(where=(id=         376));
       survey_date=       22713;            if wake_date >=        22683 and wake_date <=
22713 then before_after=1;            else if wake_date >=        22714
108322 + and wake_date <=        22743 then before_after=2;            format survey_date
date9.;         run;

NOTE: There were 80 observations read from the data set SLEEP.
      WHERE id=376;
NOTE: The data set WORK.NEW has 80 observations and 241 variables.
NOTE: DATA statement used (Total process time):
      real time           0.10 seconds
      cpu time            0.11 seconds


108322 +
proc summary nway data=new;             class before_after;             id id survey_date;
       var hours_asleep_w
108323 + start_clocktime_midcent_mean_w;             output out=work._stats_ mean=
std=/autoname;         run;

WARNING: A class or frequency variable is missing on every observation.
NOTE: There were 80 observations read from the data set WORK.NEW.
NOTE: The data set WORK._STATS_ has 0 observations and 9 variables.
NOTE: PROCEDURE SUMMARY used (Total process time):
      real time           0.01 seconds
      cpu time            0.01 seconds


108323 +
proc append base=all_stats new=work._stats_;         run;

NOTE: Appending WORK._STATS_ to WORK.ALL_STATS.
NOTE: There were 0 observations read from the data set WORK._STATS_.
NOTE: 0 observations added.
NOTE: The data set WORK.ALL_STATS has 56934 observations and 9 variables.
NOTE: PROCEDURE APPEND used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds


108324 + data new;            set SLEEP(where=(id=         376));
       survey_date=       22713;            if wake_date >=        22683 and wake_date <=
22713 then before_after=1;            else if wake_date >=        22714
108325 + and wake_date <=        22743 then before_after=2;            format survey_date
date9.;         run;

NOTE: There were 80 observations read from the data set SLEEP.
      WHERE id=376;
NOTE: The data set WORK.NEW has 80 observations and 241 variables.
NOTE: DATA statement used (Total process time):
      real time           0.10 seconds
      cpu time            0.09 seconds


108325 +
proc summary nway data=new;             class before_after;             id id survey_date;
       var hours_asleep_w
108326 + start_clocktime_midcent_mean_w;             output out=work._stats_ mean=
std=/autoname;         run;

WARNING: A class or frequency variable is missing on every observation.
NOTE: There were 80 observations read from the data set WORK.NEW.
NOTE: The data set WORK._STATS_ has 0 observations and 9 variables.
NOTE: PROCEDURE SUMMARY used (Total process time):
      real time           0.01 seconds
      cpu time            0.01 seconds


108326 +
proc append base=all_stats new=work._stats_;         run;

NOTE: Appending WORK._STATS_ to WORK.ALL_STATS.
NOTE: There were 0 observations read from the data set WORK._STATS_.
NOTE: 0 observations added.
NOTE: The data set WORK.ALL_STATS has 56934 observations and 9 variables.
NOTE: PROCEDURE APPEND used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds


108327 + data new;            set SLEEP(where=(id=         376));
       survey_date=       22713;            if wake_date >=        22683 and wake_date <=
22713 then before_after=1;            else if wake_date >=        22714
108328 + and wake_date <=        22743 then before_after=2;            format survey_date
date9.;         run;

NOTE: There were 80 observations read from the data set SLEEP.
      WHERE id=376;
NOTE: The data set WORK.NEW has 80 observations and 241 variables.
NOTE: DATA statement used (Total process time):
      real time           0.11 seconds
      cpu time            0.11 seconds


108328 +
proc summary nway data=new;             class before_after;             id id survey_date;
       var hours_asleep_w
108329 + start_clocktime_midcent_mean_w;             output out=work._stats_ mean=
std=/autoname;         run;

WARNING: A class or frequency variable is missing on every observation.
NOTE: There were 80 observations read from the data set WORK.NEW.
NOTE: The data set WORK._STATS_ has 0 observations and 9 variables.
NOTE: PROCEDURE SUMMARY used (Total process time):
      real time           0.01 seconds
      cpu time            0.01 seconds


108329 +
proc append base=all_stats new=work._stats_;         run;

NOTE: Appending WORK._STATS_ to WORK.ALL_STATS.
NOTE: There were 0 observations read from the data set WORK._STATS_.
NOTE: 0 observations added.
NOTE: The data set WORK.ALL_STATS has 56934 observations and 9 variables.
NOTE: PROCEDURE APPEND used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds


108330 + data new;            set SLEEP(where=(id=         376));
       survey_date=       22701;            if wake_date >=        22671 and wake_date <=
22701 then before_after=1;            else if wake_date >=        22702
108331 + and wake_date <=        22731 then before_after=2;            format survey_date
date9.;         run;

NOTE: There were 80 observations read from the data set SLEEP.
      WHERE id=376;
NOTE: The data set WORK.NEW has 80 observations and 241 variables.
NOTE: DATA statement used (Total process time):
      real time           0.10 seconds
      cpu time            0.09 seconds


108331 +
proc summary nway data=new;             class before_after;             id id survey_date;
       var hours_asleep_w
108332 + start_clocktime_midcent_mean_w;             output out=work._stats_ mean=
std=/autoname;         run;

NOTE: There were 80 observations read from the data set WORK.NEW.
NOTE: The data set WORK._STATS_ has 1 observations and 9 variables.
NOTE: PROCEDURE SUMMARY used (Total process time):
      real time           0.01 seconds
      cpu time            0.01 seconds


108332 +
proc append base=all_stats new=work._stats_;         run;

NOTE: Appending WORK._STATS_ to WORK.ALL_STATS.
NOTE: There were 1 observations read from the data set WORK._STATS_.
NOTE: 1 observations added.
NOTE: The data set WORK.ALL_STATS has 56935 observations and 9 variables.
NOTE: PROCEDURE APPEND used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds


108333 + data new;            set SLEEP(where=(id=         376));
       survey_date=       22701;            if wake_date >=        22671 and wake_date <=
22701 then before_after=1;            else if wake_date >=        22702
108334 + and wake_date <=        22731 then before_after=2;            format survey_date
date9.;         run;

NOTE: There were 80 observations read from the data set SLEEP.
      WHERE id=376;
NOTE: The data set WORK.NEW has 80 observations and 241 variables.
NOTE: DATA statement used (Total process time):
      real time           0.11 seconds
      cpu time            0.10 seconds


108334 +
proc summary nway data=new;             class before_after;             id id survey_date;
       var hours_asleep_w
108335 + start_clocktime_midcent_mean_w;             output out=work._stats_ mean=
std=/autoname;         run;

NOTE: There were 80 observations read from the data set WORK.NEW.
NOTE: The data set WORK._STATS_ has 1 observations and 9 variables.
NOTE: PROCEDURE SUMMARY used (Total process time):
      real time           0.01 seconds
      cpu time            0.01 seconds


108335 +
proc append base=all_stats new=work._stats_;         run;

NOTE: Appending WORK._STATS_ to WORK.ALL_STATS.
NOTE: There were 1 observations read from the data set WORK._STATS_.
NOTE: 1 observations added.
NOTE: The data set WORK.ALL_STATS has 56936 observations and 9 variables.
NOTE: PROCEDURE APPEND used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds


108336 + data new;            set SLEEP(where=(id=         376));
       survey_date=       22701;            if wake_date >=        22671 and wake_date <=
22701 then before_after=1;            else if wake_date >=        22702
108337 + and wake_date <=        22731 then before_after=2;            format survey_date
date9.;         run;

NOTE: There were 80 observations read from the data set SLEEP.
      WHERE id=376;
NOTE: The data set WORK.NEW has 80 observations and 241 variables.
NOTE: DATA statement used (Total process time):
      real time           0.08 seconds
      cpu time            0.09 seconds


108337 +
proc summary nway data=new;             class before_after;             id id survey_date;
       var hours_asleep_w
108338 + start_clocktime_midcent_mean_w;             output out=work._stats_ mean=
std=/autoname;         run;

NOTE: There were 80 observations read from the data set WORK.NEW.
NOTE: The data set WORK._STATS_ has 1 observations and 9 variables.
NOTE: PROCEDURE SUMMARY used (Total process time):
      real time           0.01 seconds
      cpu time            0.00 seconds


108338 +
proc append base=all_stats new=work._stats_;         run;

NOTE: Appending WORK._STATS_ to WORK.ALL_STATS.
NOTE: There were 1 observations read from the data set WORK._STATS_.
NOTE: 1 observations added.
NOTE: The data set WORK.ALL_STATS has 56937 observations and 9 variables.
NOTE: PROCEDURE APPEND used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds


108339 + data new;            set SLEEP(where=(id=         376));
       survey_date=       22710;            if wake_date >=        22680 and wake_date <=
22710 then before_after=1;            else if wake_date >=        22711
108340 + and wake_date <=        22740 then before_after=2;            format survey_date
date9.;         run;

NOTE: There were 80 observations read from the data set SLEEP.
      WHERE id=376;
NOTE: The data set WORK.NEW has 80 observations and 241 variables.
NOTE: DATA statement used (Total process time):
      real time           0.08 seconds
      cpu time            0.07 seconds


108340 +
proc summary nway data=new;             class before_after;             id id survey_date;
       var hours_asleep_w
108341 + start_clocktime_midcent_mean_w;             output out=work._stats_ mean=
std=/autoname;         run;

WARNING: A class or frequency variable is missing on every observation.
NOTE: There were 80 observations read from the data set WORK.NEW.
NOTE: The data set WORK._STATS_ has 0 observations and 9 variables.
NOTE: PROCEDURE SUMMARY used (Total process time):
      real time           0.01 seconds
      cpu time            0.01 seconds


108341 +
proc append base=all_stats new=work._stats_;         run;

NOTE: Appending WORK._STATS_ to WORK.ALL_STATS.
NOTE: There were 0 observations read from the data set WORK._STATS_.
NOTE: 0 observations added.
NOTE: The data set WORK.ALL_STATS has 56937 observations and 9 variables.
NOTE: PROCEDURE APPEND used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds


108342 + data new;            set SLEEP(where=(id=         376));
       survey_date=       22710;            if wake_date >=        22680 and wake_date <=
22710 then before_after=1;            else if wake_date >=        22711
108343 + and wake_date <=        22740 then before_after=2;            format survey_date
date9.;         run;

NOTE: There were 80 observations read from the data set SLEEP.
      WHERE id=376;
NOTE: The data set WORK.NEW has 80 observations and 241 variables.
NOTE: DATA statement used (Total process time):
      real time           0.10 seconds
      cpu time            0.10 seconds


108343 +
proc summary nway data=new;             class before_after;             id id survey_date;
       var hours_asleep_w
108344 + start_clocktime_midcent_mean_w;             output out=work._stats_ mean=
std=/autoname;         run;

WARNING: A class or frequency variable is missing on every observation.
NOTE: There were 80 observations read from the data set WORK.NEW.
NOTE: The data set WORK._STATS_ has 0 observations and 9 variables.
NOTE: PROCEDURE SUMMARY used (Total process time):
      real time           0.01 seconds
      cpu time            0.01 seconds


108344 +
proc append base=all_stats new=work._stats_;         run;

NOTE: Appending WORK._STATS_ to WORK.ALL_STATS.
NOTE: There were 0 observations read from the data set WORK._STATS_.
NOTE: 0 observations added.
NOTE: The data set WORK.ALL_STATS has 56937 observations and 9 variables.
NOTE: PROCEDURE APPEND used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds


108345 + data new;            set SLEEP(where=(id=         376));
       survey_date=       22710;            if wake_date >=        22680 and wake_date <=
22710 then before_after=1;            else if wake_date >=        22711
108346 + and wake_date <=        22740 then before_after=2;            format survey_date
date9.;         run;

NOTE: There were 80 observations read from the data set SLEEP.
      WHERE id=376;
NOTE: The data set WORK.NEW has 80 observations and 241 variables.
NOTE: DATA statement used (Total process time):
      real time           0.11 seconds
      cpu time            0.10 seconds


108346 +
proc summary nway data=new;             class before_after;             id id survey_date;
       var hours_asleep_w
108347 + start_clocktime_midcent_mean_w;             output out=work._stats_ mean=
std=/autoname;         run;

WARNING: A class or frequency variable is missing on every observation.
NOTE: There were 80 observations read from the data set WORK.NEW.
NOTE: The data set WORK._STATS_ has 0 observations and 9 variables.
NOTE: PROCEDURE SUMMARY used (Total process time):
      real time           0.01 seconds
      cpu time            0.01 seconds


108347 +
proc append base=all_stats new=work._stats_;         run;

NOTE: Appending WORK._STATS_ to WORK.ALL_STATS.
NOTE: There were 0 observations read from the data set WORK._STATS_.
NOTE: 0 observations added.
NOTE: The data set WORK.ALL_STATS has 56937 observations and 9 variables.
NOTE: PROCEDURE APPEND used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds

I'd really appreciate any ideas on how to fix these errors. Thank you.

PaigeMiller · Posted 02-01-2023 06:04 AM

1. The code has run for 5 hours or so, and only 18% of the dates are complete. Is it normal to expect it to take that long?

How many rows in the sleep data set? How many rows in the survey data set? Are there missing values in either data set for the variables under discussion? How many distinct values of ID?

2. Data lines are repeating themselves several times. For example, as you can see in the screenshot - every odd row up until row 29 is identical; as is every even row up until row 30. The number of repeats vary, from twice to 8 times (that I can see). (Note that I changed two of the variable names).

Do you have surveys where there is no corresponding sleep? Or vice versa?

Also

NOTE: The data set WORK.NEW has 80 observations and 241 variables.

You didn't mention that data set SLEEP has 241 variables. Your example showed only a few variables.

--
Paige Miller

confooseddesi89 · Posted 02-01-2023 09:08 AM

Hi!

@PaigeMiller wrote:

1. The code has run for 5 hours or so, and only 18% of the dates are complete. Is it normal to expect it to take that long?

How many rows in the sleep data set? How many rows in the survey data set? Are there missing values in either data set for the variables under discussion? How many distinct values of ID?

There are 327,275 rows in the survey dataset and 122505 in the sleep dataset. There was originally n=159 rows missing in the sleep dataset, which I just removed (now there are 122505 rows with data, so there were 122664 originally). There are 1036 unique IDs in the survey dataset and 957 unique IDs in the sleep dataset. It is possible there is some missingness. I just created new datasets with only the needed variables, so hopefully it will not take as long. I just left it overnight to run the code with the original datasets (with all the unneeded variables) for ~6.5 hours and it still hadn't completed.

2. Data lines are repeating themselves several times. For example, as you can see in the screenshot - every odd row up until row 29 is identical; as is every even row up until row 30. The number of repeats vary, from twice to 8 times (that I can see). (Note that I changed two of the variable names).

Do you have surveys where there is no corresponding sleep? Or vice versa?

Yes, this is a possibility. Is there a way to remove duplicate rows in a dataset?

Also
NOTE: The data set WORK.NEW has 80 observations and 241 variables.
You didn't mention that data set SLEEP has 241 variables. Your example showed only a few variables.

I only showed a sample on the forum; I suppose this could be why it's taking so long? I created new truncated datasets and will re-run the code.

Also, I'm still not sure why some _FREQ_ are 31 when the max should be 30.

Thanks!

PaigeMiller · Posted 02-01-2023 09:16 AM

This code probably will take a long time on 327,000 records in the survey data set. I have fixed the code I am using to extract more efficiency (less computer time needed), but I can't say how long it should take. This is code that doesn't scale well to hundreds of thousands of records. I have also "fixed" what I think caused the repeating problem.

proc delete data=all_stats;
run;
data sleep1(index=(id_date=(id wake_date)));
    set sleep(keep=id wake_date sleep_dur_hrs sleep_start_mc_hrs);
run;
data _null_;
    set survey;
    thirty_days_before=date-30;
    one_day_after=date+1;
    thirty_days_after=date+30;
    call execute(
        "proc delete data=new _stats_; run;
         data new;
            set sleep1(where=(id=" || id || " and wake_date >= " || thirty_days_before || " and wake_date <= " || thirty_days_after || ")); 
            survey_date=" || date || ";
            if wake_date >= " || thirty_days_before || " and wake_date <= " || date || " then before_after=1; 
            else if wake_date >= " || one_day_after || " and wake_date <= " || thirty_days_after || " then before_after=2;
            format survey_date date9.; 
         run;
         proc summary nway data=new;
             class before_after;
             id id survey_date;
             var sleep_dur_hrs sleep_start_mc_hrs;
             output out=work._stats_ mean= std=/autoname;
         run;
         proc append base=all_stats new=work._stats_; run;"
    );
run;

If that still isn't fast enough, there is another method in SAS called hash objects. I don't really know if hash objects will help in this case and in any event, I cannot write such code for you. I would recommend you start a new thread with this exact title: "Can hash objects do this merge/analysis?", this will allow experts in hash objects to give their opinions. In this new thread, describe the problem again, and also link to this thread.

Also, I'm still not sure why some _FREQ_ are 31 when the max should be 30.

That might be because of my code

    thirty_days_before=date-30;
    one_day_after=date+1;
    thirty_days_after=date+30;

and you should fix this accordingly.

--
Paige Miller

confooseddesi89 · Posted 02-02-2023 07:08 PM

Hello,

I ran the second code you gave me, and I received an "out of memory" error and SAS shut down. I tried the code on another computer and the same thing happened. This code therefore requires more computing power than I currently have access to.

I instead just re-ran the first code you gave me. It took about 7-8 hours on my machine overnight, but it worked! I used nodup to remove the duplicates. Of course, I'd prefer a quicker solution that didn't require more memory than I have, but this will work for now.

Thank you!

PaigeMiller · Posted 02-02-2023 08:25 PM

Try removing the index= option.

When there are errors in the log, NEVER EVER describe the errors in your own words or quote a tiny portion of the log. Show us the log for the step that has the error. Please do this every time you have errors in the log from now on.

--
Paige Miller

confooseddesi89 · Posted 02-03-2023 01:19 AM

Since SAS shut down immediately after the error, it was impossible to copy the log . . . it literally just said "ERROR: OUT OF MEMORY," verbatim and then shut down right after. So, I'm not quite sure what I should do if I should never put the log into my own words.

I took away the index portion, and SAS still shut down, this time without any error message. Thanks for trying!

PaigeMiller · Posted 02-03-2023 06:00 AM

To tell you the truth, I am surprised it crashes when you remove the INDEX= option. What happens if you start a brand new SAS session and try the second SAS code without the INDEX= option?

My other suggestion is to ask in this forum about using hash objects in a new thread. Please follow the instructions I gave.

--
Paige Miller

FreelanceReinh · Posted 02-03-2023 10:59 AM

@PaigeMiller wrote:

My other suggestion is to ask in this forum about using hash objects in a new thread. Please follow the instructions I gave.

@confooseddesi89: I think this would be a very promising approach. A hash object could easily accommodate the relevant sleep data, which could then be retrieved (and written to temporary arrays with 30 elements) by a data step reading the survey data. The descriptive statistics would be computed using the MEAN and STD functions based on the arrays. So this would be a single data step with an estimated run time of several seconds, not hours.

confooseddesi89 · Posted 02-03-2023 02:17 PM

I had tried that a couple of different times, and it still crashed. Just tried again, crashed again.

SAS Innovate 2025: Register Today!

SAS Training: Just a Click Away