BookmarkSubscribeRSS Feed
sks521
Quartz | Level 8

Hi folks,

 

I wanna create a variable to indicate mean number of health visits by ID for three time points that are; pre-intervention, intervention and post-intervention.

 

I tried 'SUM' function to specify 'ID' in 'By' statement but don't know how to specify time period in addition to ID.

 

Any help?

 

Thanks

 

 

11 REPLIES 11
Sathish_jammy
Lapis Lazuli | Level 10

Could you share any sample dataset on your own? 

The sample set really helpful to share idea in detail.

sks521
Quartz | Level 8

So I have following three dummy variables to indicate three time points;

 

data dummy;
set duplicate;


format EventDate YYMMDD10.;
if EventDate<'01SEP2016'd then preint=1;*pre-intervention;
else preint=0;
if '01SEP2016'd <=EventDate<= '28FEB2017'd then intervention=1;*intervention;
else intervention=0;
if EventDate >= '01MAR2017'd then postint=1;*post-intervention;
else postint=0;

 

The remaining data looks like this (just a few lines)

                         PKid          EventDate   visit           Practice                         Status                     preint   intervention postint1234567

92016-02-291Bingley Medical PracticeEligible100
132016-02-291Saltaire Medical PracticeEligible100
322016-02-291Bingley Medical PracticeEligible100
732016-02-291Bingley Medical PracticeEligible100
1002016-02-291Bingley Medical PracticeEligible100
1042016-02-291Saltaire Medical PracticeInterventionGroup100
1052016-02-291Bingley Medical PracticeEligible100

 

So each ID has multiple visits in each time period and I want average visits by ID in each time period.

 

Thanks

yabwon
Amethyst | Level 16

Hi,

have you considered one variable with different levels, e.g.:

 

data _null_;
set duplicate;

select;
  when (               EventDate < '01SEP2016'd) period = 1;*pre-intervention;
  when ('01SEP2016'd <=EventDate<= '28FEB2017'd) period = 2;*intervention;
  when (               EventDate >= '01MAR2017'd period = 3;*post-intervention;
  otherwise period = -1;
end;

run;

 

all the best

Bart

 

 

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation



sks521
Quartz | Level 8

Thanks but I do need to create dummy variables of the time period to do my analysis, and from your code I don't know how to create a variable which gives me sum of the number of visits per ID

 Can you elaborate please?

Ta

yabwon
Amethyst | Level 16

Hi,

 

If I understand correctly, you need both: dummy variables in your dataset and also variables (or one variable) which counts(summs up) visits, right? In that case the following code should do both (I added some fake data to test it).

 

all the best

Bart

 

data duplicate;
infile cards dlm = "|";
input id eventDate yymmdd10. visit Practice : $ 50. Status : $ 20.;
format eventDate yymmdd10.;
if id;
cards;
1|2016-02-28|1|Bingley Medical Practice|Eligible|
2|2016-02-28|1|Saltaire Medical Practice|Eligible|
3|2016-02-28|1|Bingley Medical Practice|Eligible|
3|2016-03-28|1|Bingley Medical Practice|Eligible|
1|2016-04-28|1|Bingley Medical Practice|Eligible|
2|2016-04-28|1|Saltaire Medical Practice|InterventionGroup|
2|2016-04-28|1|Bingley Medical Practice|Eligible|
.|.|.| | |
1|2016-10-28|1|Bingley Medical Practice|Eligible|
1|2016-11-28|1|Saltaire Medical Practice|Eligible|
1|2016-12-28|1|Bingley Medical Practice|Eligible|
3|2016-10-28|1|Bingley Medical Practice|Eligible|
1|2017-01-28|1|Bingley Medical Practice|Eligible|
2|2016-10-28|1|Saltaire Medical Practice|InterventionGroup|
2|2016-11-28|1|Bingley Medical Practice|Eligible|
.|.|.| | |
2|2017-02-28|1|Bingley Medical Practice|Eligible|
2|2017-02-28|1|Saltaire Medical Practice|Eligible|
1|2017-02-28|1|Bingley Medical Practice|Eligible|
3|2017-03-28|1|Bingley Medical Practice|Eligible|
1|2017-04-28|1|Bingley Medical Practice|Eligible|
2|2017-04-28|1|Saltaire Medical Practice|InterventionGroup|
2|2017-04-28|1|Bingley Medical Practice|Eligible|
;
run;

proc sort data = duplicate;
by id eventDate;
run; 


data duplicate_1;
  set duplicate;
  
  retain _err_ preint int postint 0;
  array period[0:3] _err_ preint int postint;
 
  select;
    when (               EventDate < '01SEP2016'd) pd = 1;*pre-intervention;
    when ('01SEP2016'd <=EventDate<= '28FEB2017'd) pd = 2;*intervention;
    when (               EventDate>= '01MAR2017'd) pd = 3;*post-intervention;
    otherwise pd = 0;
  end;

  period[pd] = 1;
  output;
  period[pd] = 0;
run;


data duplicate_2;
  do until (last.id);
    set duplicate_1;
    by id;

    array sumperiod[0:3] _err_sum_ sum_preint sum_int sum_postint;

    sumperiod[pd] + 1;
  end;

  do until (last.id);
    set duplicate_1;
    by id;
    all_visits = sum(of sum_:);
    output;
  end;

  call missing(of sumperiod{*}, all_visits);
run;  

 

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation



yabwon
Amethyst | Level 16

I scratched my head and I think it could be done with 1 data step and 1 data reading.

 

All the best

Bart

 

proc sort data = duplicate;
by id eventDate;
run; 


data duplicate_3;

if _N_ = 1 then
do;
  declare hash H(dataset:"duplicate(obs=0)", ordered:"A", multidata:"Y");
  H.defineKey("id","eventDate");
  H.defineData(all:"yes");
  H.defineDone();
  declare hiter I("H");
end;
  H.clear();

do until (last.id);
  set duplicate;
  by id;
  _rc_ = H.add();

  array sumperiod[0:3] _err_sum_ sum_preint sum_int sum_postint;
 
  select;
    when (               EventDate < '01SEP2016'd) pd = 1;*pre-intervention;
    when ('01SEP2016'd <=EventDate<= '28FEB2017'd) pd = 2;*intervention;
    when (               EventDate>= '01MAR2017'd) pd = 3;*post-intervention;
    otherwise pd = 0;
  end;

  sumperiod[pd] + 1;
end;

all_visits = sum(of sum_:);

do while(I.next() = 0);
  retain _err_ preint int postint 0;
  array period[0:3] _err_ preint int postint;

  select;
    when (               EventDate < '01SEP2016'd) pd = 1;*pre-intervention;
    when ('01SEP2016'd <=EventDate<= '28FEB2017'd) pd = 2;*intervention;
    when (               EventDate>= '01MAR2017'd) pd = 3;*post-intervention;
    otherwise pd = 0;
  end;

  period[pd] = 1;
  output;
  period[pd] = 0;
end; 

 call missing(of sumperiod{*}, all_visits);

run;
_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation



sks521
Quartz | Level 8

Thanks but if I wanted to have 6 or 7 time periods instead, how will I write the syntax then?

 

Best

Sarwat

sks521
Quartz | Level 8

So I have tried to use the same syntax using more than three time pints and I got this log;

 


885 data SMPsum;
886 set saltaire;
887
888 retain _err_ preint4 preint3 preint2 preint1 int postint1 postint2 postint3 0;
889 array period[0:8] _err_ preint int postint;
ERROR: Too few variables defined for the dimension(s) specified for the array period.
890
891 select;
892 when ( EventDate <'01MAR2016'd) pd = 1;*pre-intervention -4;
893 when ('01MAR2016'd <=EventDate<= '31AUG2016'd) pd=2; *preintrvention -3;
894 when ('01SEP2016'd <=EventDate<= '28FEB2017'd) pd=3; *preintervention -2;
895 when ('01MAR2017'd <=EventDate<= '31AUG2017'd) pd=4; *preintervention 1;
896 when ('01SEP2017'd <=EventDate<= '28FEB2018'd) pd =5;*intervention;
897 when ('01MAR2018'd <=EventDate<= '31AUG2018'd) pd=6; *postintervention 1;
898 when ('01SEP2018'd <=EventDate<= '28FEB2019'd) pd=7; *postintervention 2;
899 when ( EventDate>= '01MAR2019'd) pd =8 ;*post-intervention 3;
900 otherwise pd = 0;
901 end;
902
903 period[pd] = 1;
904 output;
905 period[pd] = 0;
906 run;

NOTE: The SAS System stopped processing this step because of errors.
WARNING: The data set WORK.SMPSUM may be incomplete. When this step was stopped there were 0
observations and 49 variables.
NOTE: DATA statement used (Total process time):
real time 0.04 seconds
cpu time 0.01 seconds

 

 

Ho do I deal with that please?

 

Thanks

S

ballardw
Super User

@sks521 wrote:

So I have following three dummy variables to indicate three time points;

 

data dummy;
set duplicate;


format EventDate YYMMDD10.;
if EventDate<'01SEP2016'd then preint=1;*pre-intervention;
else preint=0;
if '01SEP2016'd <=EventDate<= '28FEB2017'd then intervention=1;*intervention;
else intervention=0;
if EventDate >= '01MAR2017'd then postint=1;*post-intervention;
else postint=0;

 

The remaining data looks like this (just a few lines)

                         PKid          EventDate   visit           Practice                         Status                     preint   intervention postint1234567

9 2016-02-29 1 Bingley Medical Practice Eligible 1 0 0
13 2016-02-29 1 Saltaire Medical Practice Eligible 1 0 0
32 2016-02-29 1 Bingley Medical Practice Eligible 1 0 0
73 2016-02-29 1 Bingley Medical Practice Eligible 1 0 0
100 2016-02-29 1 Bingley Medical Practice Eligible 1 0 0
104 2016-02-29 1 Saltaire Medical Practice InterventionGroup 1 0 0
105 2016-02-29 1 Bingley Medical Practice Eligible 1 0 0

 

So each ID has multiple visits in each time period and I want average visits by ID in each time period.

 

Thanks


COUNTING is different then DUMMY Coding.

If you want to count group levels then you use a single variable or a format to create groups from a single variable.

Something like this perhaps:

proc format library=work;
value intervene
low - <'01SEP2016'd = 'Preintervention'
'01SEP2016'd - '28FEB2017'd = 'Intervention'
'28FEB2017'd< - high = 'Postintervention'
;
run;

proc freq data=duplicate noprint;
   table id* eventdate/out=work.id_eventcount (drop=percent);
   format eventdate intervene.;
run;

proc means data=work.id_eventcount mean;
   class id;
   var eventdate;
   format eventdate intervene.;
run;

of course you haven't supplied any idea of what format the "mean" is supposed to take such as report or data set and whether any other variables go into grouping or not.

sks521
Quartz | Level 8

I did run this code but the output doesn't look right;

Pkid    Nobs     Mean

1          5          20879.60

2          3          20758.33

3          5          20879.60

Looks like it has calculated mean out of Date value while I need mean number of visits per following duration; pre-intervention, intervention and post-intervention.

 

Thanks

yabwon
Amethyst | Level 16

Could you share your code too?

It will be easier to understand your problem.

 

Bart

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation



hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 11 replies
  • 2215 views
  • 3 likes
  • 4 in conversation