BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
saza
Quartz | Level 8

Write a program that calculates the total number of records (all screening records in the file),
the total number of partial records (Vstatus), the total number of complete records (Vstatus), and the total number
of complete records that are eligible and ineligible (ELIG) for each site (SITE). The idea here is not to create
separate datasets/subsets for each of these, but to produce a single dataset with these totals.

Vstatus is coded as “Complete” or “Partial” 

ELIG is coded as 0 (no) or 1 (yes) 

SITE is coded as “City A” or “City B”

 

My professor is very specific about how she wants this done but when I write my code all my values for the "Status" variable change to "Complete" and the output is simply counting 1-50

My code:

PROC SORT DATA = sarah.screener_data; by site; RUN; 

DATA TOTVSTATUS;
SET Sarah.screener_data;
COUNT + 1;
BY Site;
IF Vstatus = partial then COUNT=0;
If vstatus = complete then count= 1;
RUN;

DATA TOTVSTATUS2;
SET TOTVSTATUS;
BY Vstatus Elig Site;
IF LAST.Vstatus ;
If Last.elig;
If last.site;
RUN;

PROC SORT DATA = sarah.screener_data; by Vstatus Elig site; RUN;

DATA TOTVSTATUS;
SET Sarah.screener_data;
COUNT + 1;
BY Vstatus Elig Site;
IF FIRST.Vstatus then COUNT=1;
RUN;

DATA TOTVSTATUS2;
SET TOTVSTATUS;
BY Vstatus Elig Site;
IF LAST.Vstatus ;
run;

 

1 ACCEPTED SOLUTION

Accepted Solutions
Kurt_Bremser
Super User

This is code for counting all, Complete and Partial observations:

proc sort data=screener_data;
by site;
run;

data want;
set screener_data;
by site;
if first.site
then do;
  count = 0;
  count_complete = 0;
  count_partial = 0;
end;
count + 1;
if vstatus = "Complete"
then count_complete + 1;
else count_partial + 1;
if last.site;
keep site count count_complete count_partial;
run;

I leave adding the other counts as an exercise.

View solution in original post

11 REPLIES 11
Tom
Super User Tom
Super User

Can you explain the logic for the code you posted?  What is it you think it is going to do?

 

Here is an analysis of the first data step you posted.

Spoiler
DATA TOTVSTATUS; 
  SET Sarah.screener_data; 
  COUNT + 1;  
  BY Site; 
  IF Vstatus = partial then COUNT=0;
  If vstatus = complete then count= 1; 
RUN; 
Why did you insert the sum statement to increment COUNT between the SET statement and the BY statement?
Why did you include the BY statement when you do not make any use of it?  The BY statement will cause the SET statement to create a populate FIRST.SITE and LAST.SITE boolean flags, but your data step is not using them.
What are the values of PARTIAL and COMPLETE variables you are referencing in your IF statements?  You did not mention such variables in your description.  Since you claim that COUNT just runs from 1 to 50 the values of PARTIAL and COMPLETE must never match the value of VSTATUS or else COUNT would have been reset to 0 or 1 before incrementing again the time the sum statement executed.
saza
Quartz | Level 8
Yes I realize now how ridiculous my code is. I guess what i'm confused about is the count variable and if I can have count=1 for everything that i'm trying to add. for example, the "Vstatus" variable is either 'complete' or 'partial'

So can my code be:
DATA TOTVSTATUS;
SET Sarah.screener_data;
COUNT + 1;
BY Site;
IF Vstatus = partial then COUNT=1;
If vstatus = complete then count= 1;
RUN;

I guess i'm trying to speak to my sas code in english lol
Kurt_Bremser
Super User

In your post, you state this:

Vstatus is coded as “Complete” or “Partial”

But in your code, you do this:

IF Vstatus = partial then COUNT=0;
If vstatus = complete then count= 1; 

partial is a variable name, while "partial" is a value. I guess you want to test for values, not other variables.

Read the log (Maxim 2). It will probably alert you to the fact that variables partial and complete are uninitialized.

saza
Quartz | Level 8
I also realized that when I write partial and complete I need to put them in '' so sas doesn't add them as a variable on my results.


DATA TOTVSTATUS;
SET Sarah.screener_data;
COUNT + 1;
BY Site;
IF Vstatus = 'partial' then COUNT=0;
If Vstatus = 'complete' then COUNT=0;
IF Elig = 0 then count= 1;
If Elig = 1 then count=1;
RUN;

When I run this my count is all messed up but atleast sas isn't making new variables
Tom
Super User Tom
Super User

Pretend you are the SAS data step processor and the input to your data step was this data:

data have;
  input site $ vstatus $ elig ;
cards;
HERE partial 0
HERE complete 0
HERE parital 1
HERE complete 1
THERE partial 0
THERE complete 0
THERE parital 1
THERE complete 1
;

What would you WANT as the output?

What would you expect as the output from your data step?

Spoiler
85    DATA TOTVSTATUS;
86    SET have;
87    COUNT + 1;
88    BY Site;
89    IF Vstatus = 'partial' then COUNT=0;
90    If Vstatus = 'complete' then COUNT=0;
91    IF Elig = 0 then count= 1;
92    If Elig = 1 then count=1;
93    put (_all_) (=);
94    RUN;

site=HERE vstatus=partial elig=0 COUNT=1
site=HERE vstatus=complete elig=0 COUNT=1
site=HERE vstatus=parital elig=1 COUNT=1
site=HERE vstatus=complete elig=1 COUNT=1
site=THERE vstatus=partial elig=0 COUNT=1
site=THERE vstatus=complete elig=0 COUNT=1
site=THERE vstatus=parital elig=1 COUNT=1
site=THERE vstatus=complete elig=1 COUNT=1

What would you expect the value of COUNT to be after each of those statements?

Spoiler
142   DATA TOTVSTATUS;
143   SET have;
144   put (_n_ site vstatus elig count) (=) @;
145   COUNT + 1;
146   put '->' count @;
147   BY Site;
148   IF Vstatus = 'partial' then COUNT=0;
149   put '->' count @;
150   If Vstatus = 'complete' then COUNT=0;
151   put '->' count @;
152   IF Elig = 0 then count= 1;
153   put '->' count @;
154   If Elig = 1 then count=1;
155   put '->' count @;
156   put;
157   RUN;

_N_=1 site=HERE vstatus=partial elig=0 count=0 ->1 ->0 ->0 ->1 ->1
_N_=2 site=HERE vstatus=complete elig=0 count=1 ->2 ->2 ->0 ->1 ->1
_N_=3 site=HERE vstatus=parital elig=1 count=1 ->2 ->2 ->2 ->2 ->1
_N_=4 site=HERE vstatus=complete elig=1 count=1 ->2 ->2 ->0 ->0 ->1
_N_=5 site=THERE vstatus=partial elig=0 count=1 ->2 ->0 ->0 ->1 ->1
_N_=6 site=THERE vstatus=complete elig=0 count=1 ->2 ->2 ->0 ->1 ->1
_N_=7 site=THERE vstatus=parital elig=1 count=1 ->2 ->2 ->2 ->2 ->1
_N_=8 site=THERE vstatus=complete elig=1 count=1 ->2 ->2 ->0 ->0 ->1

 

 

Kurt_Bremser
Super User

This is code for counting all, Complete and Partial observations:

proc sort data=screener_data;
by site;
run;

data want;
set screener_data;
by site;
if first.site
then do;
  count = 0;
  count_complete = 0;
  count_partial = 0;
end;
count + 1;
if vstatus = "Complete"
then count_complete + 1;
else count_partial + 1;
if last.site;
keep site count count_complete count_partial;
run;

I leave adding the other counts as an exercise.

saza
Quartz | Level 8
I was able to add the other variables and like your way of doing this better than the one I was taught! very straightforward. Thanks again.
Tom
Super User Tom
Super User

Good answer.

Your indentation style is hard to read through.

Can you tell at glance the difference between these two lines from your program?

if first.site
if last.site;

If you use a more normal indentation style it would be easier.

if first.site then do;
  count = 0;
  count_complete = 0;
  count_partial = 0;
end;
count + 1;
if vstatus = "Complete" then count_complete + 1;
else count_partial + 1;
if last.site;

If you insist on splitting your IF/THEN statements into more than one line you could indent and move the termination of the multiple line statement to the start of a new line to make it easier to see where the multiple line statement ends.

if first.site
  then do
;

I always wonder why you are creating a variable named THEN when I first look at that code.

Kurt_Bremser
Super User

I have a rationale for my way of writing the IF this way.

Conditional execution consists of at least two, often three parts: the condition, the "true" branch, and the "false" branch.

My way of writing it reflects this by having three lines (or three blocks). With a quite complex condition, this then looks like this:

if
  <condition>
then do;
  <statements>
end;
else do;
  <statements>
end;
Tom
Super User Tom
Super User

Still don't get it. 

If you don't split the single IF statement into multiple lines then the blocks are still there.

if <condition> then do;
  <statements>
end;
else do;
  <statements>
end;

or

if <condition> then 
  <statement>
;
else
  <statement>
;
Kurt_Bremser
Super User

I like the THEN keyword to be on the same column as the ELSE, as they are related. Both start one if the branches.

Look at this:

if first.id
then count = 1;
else count + 1;

Looks better in my eyes than

if first.id then count = 1;
else count + 1;

 

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 11 replies
  • 668 views
  • 4 likes
  • 3 in conversation