Hi folks,
I am running the following programme;
To get rid of observations where 'status=dropped out' and month4=0, I write following in import proc;
proc import datafile = 'I:\cYorkshire Project\Supported self-care intervention\Sarwat\Data\SarwatSCP18052019\Control-groups\SMPcontrol.csv'
out = DiffInDiff1 (where=(not (month4=0 or status='dropped out')))
dbms = csv;
run;
Then to sum up observations per id by 'status' and 'month4' I do the following;
proc means data=DiffInDiff1 noprint nway;
class month4 status pkid / mlf;
var visit;
output out=SMPDiff sum(visit)=visit;
run;
Then I want to create new variables from 'month4' and 'status';
DATA DIFF1;
SET SMPDIFF;
IF MONTH4=1 THEN EXPOSED=1;*POST-INTERVENTION PERIOD;
ELSE EXPOSED=0;*PRE-INTERVENTION PERIOD;
IF STATUS="ELIGIBLE" THEN INTERVENTION=0;
ELSE IF STATUS='INTERVENTIONGROUP' THEN INTERVENTION=1;
RUN;
But when I print this data set, 1) it still gives me 'dropped out' , 2) values for 'intervention' are all missing. can't figure out what the problem is!
Can someone help please?
Thanks
S
The mlf option in the class statement is not necessary, as proc import won't use multilabel formats. But it causes month4 to switch to type character, causing an unnecessary NOTE in your data step.
And when you compare strings (character variables/values), they must be EXACTLY the same, and that includes cases. "InterventionGroup" is not equal to "INTERVENTIONGROUP"!
data diffindiff1;
input pkid $ status :$30. age visit month4;
datalines;
1 InterventionGroup 73 1 -1
1 InterventionGroup 73 1 -1
;
proc means data=DiffInDiff1 noprint nway;
class month4 status pkid;
var visit;
output out=SMPDiff sum(visit)=visit;
run;
data diff1;
set smpdiff;
if month4 = 1
then exposed = 1;*POST-INTERVENTION PERIOD;
else exposed = 0;*PRE-INTERVENTION PERIOD;
if status = "Eligible"
then intervention = 0;
else if status = 'InterventionGroup'
then intervention = 1;
run;
Please supply a few lines (including the first) of your csv file.
Also supply the log of your three steps.
Use the {i} button for posting logs and text data.
My csv has a long list of variables so I am pasting only the ones I am using in the analysis;
pkid status Age visit Month4
1 InterventionGroup 73 1 -1
1 InterventionGroup 73 1 -1
When I run this;
proc import datafile = 'I:\cYorkshire Project\Supported self-care intervention\Sarwat\Data\SarwatSCP18052019\Control-groups\SMPcontrol.csv'
out = DiffInDiff1 (where=(not (month4=0 or status='dropped out')))
dbms = csv;
run;
Log is;
{
1 proc import datafile = 'I:\cYorkshire Project\Supported self-care
1 ! intervention\Sarwat\Data\SarwatSCP18052019\Control-groups\SMPcontrol.csv'
2 out = DiffInDiff1 (where=(not (month4=0 or status='dropped out')))
3 dbms = csv;
4 run;
5 /**********************************************************************
6 * PRODUCT: SAS
7 * VERSION: 9.4
8 * CREATOR: External File Interface
9 * DATE: 16JUL19
10 * DESC: Generated SAS Datastep Code
11 * TEMPLATE SOURCE: (None Specified.)
12 ***********************************************************************/
13 data WORK.DIFFINDIFF1 (where=(not (month4=0 or status='dropped out'))) ;
14 %let _EFIERR_ = 0; /* set the ERROR detection macro variable */
15 infile 'I:\cYorkshire Project\Supported self-care
15 ! intervention\Sarwat\Data\SarwatSCP18052019\Control-groups\SMPcontrol.csv' delimiter = ','
15 ! MISSOVER DSD lrecl=32767 firstobs=2 ;
16 informat Practice $25. ;
17 informat PKid best32. ;
18 informat Status $17. ;
19 informat Age best32. ;
20 informat ReadCodeID $5. ;
21 informat ReadCodeDesc $27. ;
22 informat EventDate yymmdd10. ;
23 informat EvTime $4. ;
24 informat ConsultationMethod $27. ;
25 informat EventStaffType $29. ;
26 informat ProfessionalContact $5. ;
27 informat GPVisitType $7. ;
28 informat Ethnicity $62. ;
29 informat Marital_status $15. ;
30 informat Religion $4. ;
31 informat Sex $6. ;
32 informat PartPcode $4. ;
33 informat IMDScore best32. ;
34 informat FirstEFICalc best32. ;
35 informat LastEFICalc best32. ;
36 informat TotalDeficitsAtStart best32. ;
37 informat TotalDeficitsAtEnd best32. ;
38 informat FirstEfiScore $3. ;
39 informat LastEfiScore $3. ;
40 informat visit best32. ;
41 informat preint best32. ;
42 informat intervention best32. ;
43 informat postint best32. ;
44 informat month4 best32. ;
45 informat month best32. ;
46 informat month2 best32. ;
47 informat preintervention1 best32. ;
48 informat preintervention best32. ;
49 informat preintervention2 best32. ;
50 informat preintervention3 best32. ;
51 informat intervene best32. ;
52 informat postintervention1 best32. ;
53 informat postintervention best32. ;
54 informat postintervention2 best32. ;
55 informat postintervention3 best32. ;
56 informat month3 best32. ;
57 format Practice $25. ;
58 format PKid best12. ;
59 format Status $17. ;
60 format Age best12. ;
61 format ReadCodeID $5. ;
62 format ReadCodeDesc $27. ;
63 format EventDate yymmdd10. ;
64 format EvTime $4. ;
65 format ConsultationMethod $27. ;
66 format EventStaffType $29. ;
67 format ProfessionalContact $5. ;
68 format GPVisitType $7. ;
69 format Ethnicity $62. ;
70 format Marital_status $15. ;
71 format Religion $4. ;
72 format Sex $6. ;
73 format PartPcode $4. ;
74 format IMDScore best12. ;
75 format FirstEFICalc best12. ;
76 format LastEFICalc best12. ;
77 format TotalDeficitsAtStart best12. ;
78 format TotalDeficitsAtEnd best12. ;
79 format FirstEfiScore $3. ;
80 format LastEfiScore $3. ;
81 format visit best12. ;
82 format preint best12. ;
83 format intervention best12. ;
84 format postint best12. ;
85 format month4 best12. ;
86 format month best12. ;
87 format month2 best12. ;
88 format preintervention1 best12. ;
89 format preintervention best12. ;
90 format preintervention2 best12. ;
91 format preintervention3 best12. ;
92 format intervene best12. ;
93 format postintervention1 best12. ;
94 format postintervention best12. ;
95 format postintervention2 best12. ;
96 format postintervention3 best12. ;
97 format month3 best12. ;
98 input
99 Practice $
100 PKid
101 Status $
102 Age
103 ReadCodeID $
104 ReadCodeDesc $
105 EventDate
106 EvTime $
107 ConsultationMethod $
108 EventStaffType $
109 ProfessionalContact $
110 GPVisitType $
111 Ethnicity $
112 Marital_status $
113 Religion $
114 Sex $
115 PartPcode $
116 IMDScore
117 FirstEFICalc
118 LastEFICalc
119 TotalDeficitsAtStart
120 TotalDeficitsAtEnd
121 FirstEfiScore $
122 LastEfiScore $
123 visit
124 preint
125 intervention
126 postint
127 month4
128 month
129 month2
130 preintervention1
131 preintervention
132 preintervention2
133 preintervention3
134 intervene
135 postintervention1
136 postintervention
137 postintervention2
138 postintervention3
139 month3
140 ;
141 if _ERROR_ then call symputx('_EFIERR_',1); /* set ERROR detection macro variable */
142 run;
NOTE: The infile 'I:\cYorkshire Project\Supported self-care
intervention\Sarwat\Data\SarwatSCP18052019\Control-groups\SMPcontrol.csv' is:
Filename=I:\cYorkshire Project\Supported self-care
intervention\Sarwat\Data\SarwatSCP18052019\Control-groups\SMPcontrol.csv,
RECFM=V,LRECL=32767,File Size (bytes)=1609336,
Last Modified=05 July 2019 16:19:00,
Create Time=05 July 2019 16:19:00
NOTE: 5638 records were read from the infile 'I:\cYorkshire Project\Supported self-care
intervention\Sarwat\Data\SarwatSCP18052019\Control-groups\SMPcontrol.csv'.
The minimum record length was 229.
The maximum record length was 328.
NOTE: The data set WORK.DIFFINDIFF1 has 4488 observations and 41 variables.
NOTE: DATA statement used (Total process time):
real time 0.27 seconds
cpu time 0.15 seconds
4488 rows created in WORK.DIFFINDIFF1 from I:\cYorkshire Project\Supported self-care
intervention\Sarwat\Data\SarwatSCP18052019\Control-groups\SMPcontrol.csv.
NOTE: WORK.DIFFINDIFF1 data set was successfully created.
NOTE: The data set WORK.DIFFINDIFF1 has 4488 observations and 41 variables.
NOTE: PROCEDURE IMPORT used (Total process time):
real time 0.49 seconds
cpu time 0.26 seconds
}
Then I summed the 'visits' by 'pkid' as follows;
proc means data=DiffInDiff1 noprint nway;
class month4 status pkid / mlf;
var visit;
output out=SMPDiff sum(visit)=visit;
run;
log;
{143 proc means data=DiffInDiff1 noprint nway;
144 class month4 status pkid / mlf;
145 var visit;
146 output out=SMPDiff sum(visit)=visit;
147 run;
NOTE: There were 4488 observations read from the data set WORK.DIFFINDIFF1.
NOTE: The data set WORK.SMPDIFF has 443 observations and 6 variables.
NOTE: PROCEDURE MEANS used (Total process time):
real time 0.02 seconds
cpu time 0.01 seconds}
then I wanted to create new variables;
DATA DIFF1;
SET SMPDIFF;
IF MONTH4=1 THEN EXPOSED=1;*POST-INTERVENTION PERIOD;
ELSE EXPOSED=0;*PRE-INTERVENTION PERIOD;
IF STATUS="ELIGIBLE" THEN INTERVENTION=0;
ELSE IF STATUS='INTERVENTIONGROUP' THEN INTERVENTION=1;
RUN;
Log
{
148 DATA DIFF1;
149 SET SMPDIFF;
150
151 IF MONTH4=1 THEN EXPOSED=1;*POST-INTERVENTION PERIOD;
152 ELSE EXPOSED=0;*PRE-INTERVENTION PERIOD;
153
154 IF STATUS="ELIGIBLE" THEN INTERVENTION=0;
155 ELSE IF STATUS='INTERVENTIONGROUP' THEN INTERVENTION=1;
156 RUN;
NOTE: Character values have been converted to numeric values at the places given by:
(Line):(Column).
151:4
NOTE: There were 443 observations read from the data set WORK.SMPDIFF.
NOTE: The data set WORK.DIFF1 has 443 observations and 8 variables.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds
}
but in the end ''INTERVENTION' has missing values for all observations.
Thanks
The mlf option in the class statement is not necessary, as proc import won't use multilabel formats. But it causes month4 to switch to type character, causing an unnecessary NOTE in your data step.
And when you compare strings (character variables/values), they must be EXACTLY the same, and that includes cases. "InterventionGroup" is not equal to "INTERVENTIONGROUP"!
data diffindiff1;
input pkid $ status :$30. age visit month4;
datalines;
1 InterventionGroup 73 1 -1
1 InterventionGroup 73 1 -1
;
proc means data=DiffInDiff1 noprint nway;
class month4 status pkid;
var visit;
output out=SMPDiff sum(visit)=visit;
run;
data diff1;
set smpdiff;
if month4 = 1
then exposed = 1;*POST-INTERVENTION PERIOD;
else exposed = 0;*PRE-INTERVENTION PERIOD;
if status = "Eligible"
then intervention = 0;
else if status = 'InterventionGroup'
then intervention = 1;
run;
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.