Hi folks,
I wanna create a variable to indicate mean number of health visits by ID for three time points that are; pre-intervention, intervention and post-intervention.
I tried 'SUM' function to specify 'ID' in 'By' statement but don't know how to specify time period in addition to ID.
Any help?
Thanks
Could you share any sample dataset on your own?
The sample set really helpful to share idea in detail.
So I have following three dummy variables to indicate three time points;
data dummy;
set duplicate;
format EventDate YYMMDD10.;
if EventDate<'01SEP2016'd then preint=1;*pre-intervention;
else preint=0;
if '01SEP2016'd <=EventDate<= '28FEB2017'd then intervention=1;*intervention;
else intervention=0;
if EventDate >= '01MAR2017'd then postint=1;*post-intervention;
else postint=0;
The remaining data looks like this (just a few lines)
PKid EventDate visit Practice Status preint intervention postint1234567
9 | 2016-02-29 | 1 | Bingley Medical Practice | Eligible | 1 | 0 | 0 |
13 | 2016-02-29 | 1 | Saltaire Medical Practice | Eligible | 1 | 0 | 0 |
32 | 2016-02-29 | 1 | Bingley Medical Practice | Eligible | 1 | 0 | 0 |
73 | 2016-02-29 | 1 | Bingley Medical Practice | Eligible | 1 | 0 | 0 |
100 | 2016-02-29 | 1 | Bingley Medical Practice | Eligible | 1 | 0 | 0 |
104 | 2016-02-29 | 1 | Saltaire Medical Practice | InterventionGroup | 1 | 0 | 0 |
105 | 2016-02-29 | 1 | Bingley Medical Practice | Eligible | 1 | 0 | 0 |
So each ID has multiple visits in each time period and I want average visits by ID in each time period.
Thanks
Hi,
have you considered one variable with different levels, e.g.:
data _null_;
set duplicate;
select;
when ( EventDate < '01SEP2016'd) period = 1;*pre-intervention;
when ('01SEP2016'd <=EventDate<= '28FEB2017'd) period = 2;*intervention;
when ( EventDate >= '01MAR2017'd period = 3;*post-intervention;
otherwise period = -1;
end;
run;
all the best
Bart
Thanks but I do need to create dummy variables of the time period to do my analysis, and from your code I don't know how to create a variable which gives me sum of the number of visits per ID
Can you elaborate please?
Ta
Hi,
If I understand correctly, you need both: dummy variables in your dataset and also variables (or one variable) which counts(summs up) visits, right? In that case the following code should do both (I added some fake data to test it).
all the best
Bart
data duplicate;
infile cards dlm = "|";
input id eventDate yymmdd10. visit Practice : $ 50. Status : $ 20.;
format eventDate yymmdd10.;
if id;
cards;
1|2016-02-28|1|Bingley Medical Practice|Eligible|
2|2016-02-28|1|Saltaire Medical Practice|Eligible|
3|2016-02-28|1|Bingley Medical Practice|Eligible|
3|2016-03-28|1|Bingley Medical Practice|Eligible|
1|2016-04-28|1|Bingley Medical Practice|Eligible|
2|2016-04-28|1|Saltaire Medical Practice|InterventionGroup|
2|2016-04-28|1|Bingley Medical Practice|Eligible|
.|.|.| | |
1|2016-10-28|1|Bingley Medical Practice|Eligible|
1|2016-11-28|1|Saltaire Medical Practice|Eligible|
1|2016-12-28|1|Bingley Medical Practice|Eligible|
3|2016-10-28|1|Bingley Medical Practice|Eligible|
1|2017-01-28|1|Bingley Medical Practice|Eligible|
2|2016-10-28|1|Saltaire Medical Practice|InterventionGroup|
2|2016-11-28|1|Bingley Medical Practice|Eligible|
.|.|.| | |
2|2017-02-28|1|Bingley Medical Practice|Eligible|
2|2017-02-28|1|Saltaire Medical Practice|Eligible|
1|2017-02-28|1|Bingley Medical Practice|Eligible|
3|2017-03-28|1|Bingley Medical Practice|Eligible|
1|2017-04-28|1|Bingley Medical Practice|Eligible|
2|2017-04-28|1|Saltaire Medical Practice|InterventionGroup|
2|2017-04-28|1|Bingley Medical Practice|Eligible|
;
run;
proc sort data = duplicate;
by id eventDate;
run;
data duplicate_1;
set duplicate;
retain _err_ preint int postint 0;
array period[0:3] _err_ preint int postint;
select;
when ( EventDate < '01SEP2016'd) pd = 1;*pre-intervention;
when ('01SEP2016'd <=EventDate<= '28FEB2017'd) pd = 2;*intervention;
when ( EventDate>= '01MAR2017'd) pd = 3;*post-intervention;
otherwise pd = 0;
end;
period[pd] = 1;
output;
period[pd] = 0;
run;
data duplicate_2;
do until (last.id);
set duplicate_1;
by id;
array sumperiod[0:3] _err_sum_ sum_preint sum_int sum_postint;
sumperiod[pd] + 1;
end;
do until (last.id);
set duplicate_1;
by id;
all_visits = sum(of sum_:);
output;
end;
call missing(of sumperiod{*}, all_visits);
run;
I scratched my head and I think it could be done with 1 data step and 1 data reading.
All the best
Bart
proc sort data = duplicate;
by id eventDate;
run;
data duplicate_3;
if _N_ = 1 then
do;
declare hash H(dataset:"duplicate(obs=0)", ordered:"A", multidata:"Y");
H.defineKey("id","eventDate");
H.defineData(all:"yes");
H.defineDone();
declare hiter I("H");
end;
H.clear();
do until (last.id);
set duplicate;
by id;
_rc_ = H.add();
array sumperiod[0:3] _err_sum_ sum_preint sum_int sum_postint;
select;
when ( EventDate < '01SEP2016'd) pd = 1;*pre-intervention;
when ('01SEP2016'd <=EventDate<= '28FEB2017'd) pd = 2;*intervention;
when ( EventDate>= '01MAR2017'd) pd = 3;*post-intervention;
otherwise pd = 0;
end;
sumperiod[pd] + 1;
end;
all_visits = sum(of sum_:);
do while(I.next() = 0);
retain _err_ preint int postint 0;
array period[0:3] _err_ preint int postint;
select;
when ( EventDate < '01SEP2016'd) pd = 1;*pre-intervention;
when ('01SEP2016'd <=EventDate<= '28FEB2017'd) pd = 2;*intervention;
when ( EventDate>= '01MAR2017'd) pd = 3;*post-intervention;
otherwise pd = 0;
end;
period[pd] = 1;
output;
period[pd] = 0;
end;
call missing(of sumperiod{*}, all_visits);
run;
Thanks but if I wanted to have 6 or 7 time periods instead, how will I write the syntax then?
Best
Sarwat
So I have tried to use the same syntax using more than three time pints and I got this log;
885 data SMPsum;
886 set saltaire;
887
888 retain _err_ preint4 preint3 preint2 preint1 int postint1 postint2 postint3 0;
889 array period[0:8] _err_ preint int postint;
ERROR: Too few variables defined for the dimension(s) specified for the array period.
890
891 select;
892 when ( EventDate <'01MAR2016'd) pd = 1;*pre-intervention -4;
893 when ('01MAR2016'd <=EventDate<= '31AUG2016'd) pd=2; *preintrvention -3;
894 when ('01SEP2016'd <=EventDate<= '28FEB2017'd) pd=3; *preintervention -2;
895 when ('01MAR2017'd <=EventDate<= '31AUG2017'd) pd=4; *preintervention 1;
896 when ('01SEP2017'd <=EventDate<= '28FEB2018'd) pd =5;*intervention;
897 when ('01MAR2018'd <=EventDate<= '31AUG2018'd) pd=6; *postintervention 1;
898 when ('01SEP2018'd <=EventDate<= '28FEB2019'd) pd=7; *postintervention 2;
899 when ( EventDate>= '01MAR2019'd) pd =8 ;*post-intervention 3;
900 otherwise pd = 0;
901 end;
902
903 period[pd] = 1;
904 output;
905 period[pd] = 0;
906 run;
NOTE: The SAS System stopped processing this step because of errors.
WARNING: The data set WORK.SMPSUM may be incomplete. When this step was stopped there were 0
observations and 49 variables.
NOTE: DATA statement used (Total process time):
real time 0.04 seconds
cpu time 0.01 seconds
Ho do I deal with that please?
Thanks
S
@sks521 wrote:
So I have following three dummy variables to indicate three time points;
data dummy;
set duplicate;
format EventDate YYMMDD10.;
if EventDate<'01SEP2016'd then preint=1;*pre-intervention;
else preint=0;
if '01SEP2016'd <=EventDate<= '28FEB2017'd then intervention=1;*intervention;
else intervention=0;
if EventDate >= '01MAR2017'd then postint=1;*post-intervention;
else postint=0;
The remaining data looks like this (just a few lines)
PKid EventDate visit Practice Status preint intervention postint1234567
9 2016-02-29 1 Bingley Medical Practice Eligible 1 0 0 13 2016-02-29 1 Saltaire Medical Practice Eligible 1 0 0 32 2016-02-29 1 Bingley Medical Practice Eligible 1 0 0 73 2016-02-29 1 Bingley Medical Practice Eligible 1 0 0 100 2016-02-29 1 Bingley Medical Practice Eligible 1 0 0 104 2016-02-29 1 Saltaire Medical Practice InterventionGroup 1 0 0 105 2016-02-29 1 Bingley Medical Practice Eligible 1 0 0
So each ID has multiple visits in each time period and I want average visits by ID in each time period.
Thanks
COUNTING is different then DUMMY Coding.
If you want to count group levels then you use a single variable or a format to create groups from a single variable.
Something like this perhaps:
proc format library=work; value intervene low - <'01SEP2016'd = 'Preintervention' '01SEP2016'd - '28FEB2017'd = 'Intervention' '28FEB2017'd< - high = 'Postintervention' ; run; proc freq data=duplicate noprint; table id* eventdate/out=work.id_eventcount (drop=percent); format eventdate intervene.; run; proc means data=work.id_eventcount mean; class id; var eventdate; format eventdate intervene.; run;
of course you haven't supplied any idea of what format the "mean" is supposed to take such as report or data set and whether any other variables go into grouping or not.
I did run this code but the output doesn't look right;
Pkid Nobs Mean
1 5 20879.60
2 3 20758.33
3 5 20879.60
Looks like it has calculated mean out of Date value while I need mean number of visits per following duration; pre-intervention, intervention and post-intervention.
Thanks
Could you share your code too?
It will be easier to understand your problem.
Bart
Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.
Register today!Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Select SAS Training centers are offering in-person courses. View upcoming courses for: