Write a program that calculates the total number of records (all screening records in the file),
the total number of partial records (Vstatus), the total number of complete records (Vstatus), and the total number
of complete records that are eligible and ineligible (ELIG) for each site (SITE). The idea here is not to create
separate datasets/subsets for each of these, but to produce a single dataset with these totals.
Vstatus is coded as “Complete” or “Partial”
ELIG is coded as 0 (no) or 1 (yes)
SITE is coded as “City A” or “City B”
My professor is very specific about how she wants this done but when I write my code all my values for the "Status" variable change to "Complete" and the output is simply counting 1-50
My code:
PROC SORT DATA = sarah.screener_data; by site; RUN;
DATA TOTVSTATUS;
SET Sarah.screener_data;
COUNT + 1;
BY Site;
IF Vstatus = partial then COUNT=0;
If vstatus = complete then count= 1;
RUN;
DATA TOTVSTATUS2;
SET TOTVSTATUS;
BY Vstatus Elig Site;
IF LAST.Vstatus ;
If Last.elig;
If last.site;
RUN;
PROC SORT DATA = sarah.screener_data; by Vstatus Elig site; RUN;
DATA TOTVSTATUS;
SET Sarah.screener_data;
COUNT + 1;
BY Vstatus Elig Site;
IF FIRST.Vstatus then COUNT=1;
RUN;
DATA TOTVSTATUS2;
SET TOTVSTATUS;
BY Vstatus Elig Site;
IF LAST.Vstatus ;
run;
This is code for counting all, Complete and Partial observations:
proc sort data=screener_data;
by site;
run;
data want;
set screener_data;
by site;
if first.site
then do;
count = 0;
count_complete = 0;
count_partial = 0;
end;
count + 1;
if vstatus = "Complete"
then count_complete + 1;
else count_partial + 1;
if last.site;
keep site count count_complete count_partial;
run;
I leave adding the other counts as an exercise.
Can you explain the logic for the code you posted? What is it you think it is going to do?
Here is an analysis of the first data step you posted.
DATA TOTVSTATUS;
SET Sarah.screener_data;
COUNT + 1;
BY Site;
IF Vstatus = partial then COUNT=0;
If vstatus = complete then count= 1;
RUN;
Why did you insert the sum statement to increment COUNT between the SET statement and the BY statement?In your post, you state this:
Vstatus is coded as “Complete” or “Partial”
But in your code, you do this:
IF Vstatus = partial then COUNT=0;
If vstatus = complete then count= 1;
partial is a variable name, while "partial" is a value. I guess you want to test for values, not other variables.
Read the log (Maxim 2). It will probably alert you to the fact that variables partial and complete are uninitialized.
Pretend you are the SAS data step processor and the input to your data step was this data:
data have;
input site $ vstatus $ elig ;
cards;
HERE partial 0
HERE complete 0
HERE parital 1
HERE complete 1
THERE partial 0
THERE complete 0
THERE parital 1
THERE complete 1
;
What would you WANT as the output?
What would you expect as the output from your data step?
85 DATA TOTVSTATUS; 86 SET have; 87 COUNT + 1; 88 BY Site; 89 IF Vstatus = 'partial' then COUNT=0; 90 If Vstatus = 'complete' then COUNT=0; 91 IF Elig = 0 then count= 1; 92 If Elig = 1 then count=1; 93 put (_all_) (=); 94 RUN; site=HERE vstatus=partial elig=0 COUNT=1 site=HERE vstatus=complete elig=0 COUNT=1 site=HERE vstatus=parital elig=1 COUNT=1 site=HERE vstatus=complete elig=1 COUNT=1 site=THERE vstatus=partial elig=0 COUNT=1 site=THERE vstatus=complete elig=0 COUNT=1 site=THERE vstatus=parital elig=1 COUNT=1 site=THERE vstatus=complete elig=1 COUNT=1
What would you expect the value of COUNT to be after each of those statements?
142 DATA TOTVSTATUS; 143 SET have; 144 put (_n_ site vstatus elig count) (=) @; 145 COUNT + 1; 146 put '->' count @; 147 BY Site; 148 IF Vstatus = 'partial' then COUNT=0; 149 put '->' count @; 150 If Vstatus = 'complete' then COUNT=0; 151 put '->' count @; 152 IF Elig = 0 then count= 1; 153 put '->' count @; 154 If Elig = 1 then count=1; 155 put '->' count @; 156 put; 157 RUN; _N_=1 site=HERE vstatus=partial elig=0 count=0 ->1 ->0 ->0 ->1 ->1 _N_=2 site=HERE vstatus=complete elig=0 count=1 ->2 ->2 ->0 ->1 ->1 _N_=3 site=HERE vstatus=parital elig=1 count=1 ->2 ->2 ->2 ->2 ->1 _N_=4 site=HERE vstatus=complete elig=1 count=1 ->2 ->2 ->0 ->0 ->1 _N_=5 site=THERE vstatus=partial elig=0 count=1 ->2 ->0 ->0 ->1 ->1 _N_=6 site=THERE vstatus=complete elig=0 count=1 ->2 ->2 ->0 ->1 ->1 _N_=7 site=THERE vstatus=parital elig=1 count=1 ->2 ->2 ->2 ->2 ->1 _N_=8 site=THERE vstatus=complete elig=1 count=1 ->2 ->2 ->0 ->0 ->1
This is code for counting all, Complete and Partial observations:
proc sort data=screener_data;
by site;
run;
data want;
set screener_data;
by site;
if first.site
then do;
count = 0;
count_complete = 0;
count_partial = 0;
end;
count + 1;
if vstatus = "Complete"
then count_complete + 1;
else count_partial + 1;
if last.site;
keep site count count_complete count_partial;
run;
I leave adding the other counts as an exercise.
Good answer.
Your indentation style is hard to read through.
Can you tell at glance the difference between these two lines from your program?
if first.site
if last.site;
If you use a more normal indentation style it would be easier.
if first.site then do;
count = 0;
count_complete = 0;
count_partial = 0;
end;
count + 1;
if vstatus = "Complete" then count_complete + 1;
else count_partial + 1;
if last.site;
If you insist on splitting your IF/THEN statements into more than one line you could indent and move the termination of the multiple line statement to the start of a new line to make it easier to see where the multiple line statement ends.
if first.site
then do
;
I always wonder why you are creating a variable named THEN when I first look at that code.
I have a rationale for my way of writing the IF this way.
Conditional execution consists of at least two, often three parts: the condition, the "true" branch, and the "false" branch.
My way of writing it reflects this by having three lines (or three blocks). With a quite complex condition, this then looks like this:
if
<condition>
then do;
<statements>
end;
else do;
<statements>
end;
Still don't get it.
If you don't split the single IF statement into multiple lines then the blocks are still there.
if <condition> then do;
<statements>
end;
else do;
<statements>
end;
or
if <condition> then
<statement>
;
else
<statement>
;
I like the THEN keyword to be on the same column as the ELSE, as they are related. Both start one if the branches.
Look at this:
if first.id
then count = 1;
else count + 1;
Looks better in my eyes than
if first.id then count = 1;
else count + 1;
Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.
Register today!ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.