BookmarkSubscribeRSS Feed
jessica_join
Obsidian | Level 7

Create a new variable called disease and make it equal to 1 if a person has complaints of heartburns, sickness, and spasm, but no temperature or tiredness.

 

If the person does not have this exact symptom breakdown, make disease equal to 0.

 

Lastly, use PROC FREQ to determine what number and proportion of individuals in the dataset has the disease of interest.

 

I do not know how to do this. Any hints or help? I am studying for an exam and need to understand this program.

 

I have to use if and then statements.

 

This is what I have so far.... it is not working. 

 proc format; 
value symptom_no	1= "heartburns" 
					2= "Sickness"
					3= "Spasm"
					4= "Temperature"
					5= "Tiredness"; 
		
proc sort data=Project3 out= longsort; 
 	by id_no; 
run; 
data new; 
	set longsort; 
	by id_no; 
	Keep id_no sympt1 - sympt5 disease; 
	retain sympt1 - sympt5 disease; 
	disease=0;
	array New_a (1:5) $20 sympt1 - sympt5; 
	If first.id_no then
	do; 
	Do i = 1 to 5; 
		new_a (i) = .; 
		end; 

	new_a (symptom_no) = symptom; 
	if last.id_no then output; 
		run; 
	array New_b (1) disease; 
	If sympt1 ='heartburns' and sympt2='sickness' and sympt3='spasm' then disease='1';
	else disease='0'; 
		end; 
	end; 

		run; 
	proc print data= new; 
	run; 
7 REPLIES 7
WarrenKuhfeld
Ammonite | Level 13

You need to get rid of the RUN statement in the middle of your data step.  It looks like there is an extra end statement.  This would be a lot easier to look at with reasonable indentation.

 

Enio
Fluorite | Level 6

Hi,

 

Its a little unclear on the structure of your original data, so I've made an assumption that it simply has a single numeric symptom column to start with (with values 1 to 5) and multiple rows per ID depending on number of symptoms. In order to make the logic more transparent i.e. move away from arrays (for now). I reckon it might look something like:

 

data new; 
  set longsort; 
  by id_no; 
	
  retain heartburn sickness spasm temperature tiredness;
  
  if (first.id_no) then
  do;
    heartburn   =0;
    sickness    =0;
    spasm       =0;
    temperature =0;
    tiredness   =0;
  end;

  if symptom=1 then heartburn  =1; 
  if symptom=2 then sickness   =1; 
  if symptom=3 then spasm      =1;
  if symptom=4 then temperature=1; 
  if symptom=5 then tiredness  =1;

  if (heartburn)    and 
     (sickness)     and 
     (spasm)        and 
     ^(temperature) and 
     ^(tiredness)   then disease =1; else
                         disease =0;
                         
   if (last.id_no);
 run;
 
 proc freq data=new;
   table disease;
 run;
Reeza
Super User

@Enio This is a homework assignment, she has to use arrays.

Enio
Fluorite | Level 6

You're right, and that's cool. Hopefully the code above helps to explain what the arrays are trying to do. With arrays it would probably look something like this:

 

data new(drop=symptom i); 
  set longsort; 
  by id_no; 
	
  retain sympt1 - sympt5;
  
  array new_a (1:5) sympt1 - sympt5;
  
  do i = 1 to 5 ;
  
    if (first.id_no) then
    do;
      new_a(i) =0;
    end;

    if symptom=i then new_a(i)  =1; 

  end;
  
  if (sympt1)    and 
     (sympt2)    and 
     (sympt3)    and 
     ^(sympt4)   and 
     ^(sympt5)   then disease =1; else
                      disease =0;
                         
   if (last.id_no);
 run;
 
 proc freq data=new;
   table disease;
 run;
Reeza
Super User

@Enio some small changes - noted by Warren earlier I think. That END after the line below should be moved up, or the do loop could be simplified.

 

    if symptom=i then new_a(i)  =1; 

end; *This needs to be moved up;

The i reference isn't correct in this case, because the diagnosis are being moved to specific points. Those are defined by symbol_num (sp?) variable. 

 

Though... symptoms in the previous question were text, but they appear to be character here so I'm slightly confused myself.

 

A previous version of this question is linked to below for your information.

https://communities.sas.com/t5/Base-SAS-Programming/SAS-new-variable/m-p/415698

https://communities.sas.com/t5/Base-SAS-Programming/Array-First-id-retain-last-id/m-p/415699

 

 

WarrenKuhfeld
Ammonite | Level 13

Also note that it seems like a big misconception among many programmers is that you need if/then/else to assign a binary value to a variable.  NOT TRUE!  This works just fine and requires only a single statement.

 

variable = boolean-expression

 

A boolean expression (and/or/eq/ne/gt/ge/lt/le etc) resolves to zero or one.  You can assign that value to a variable.

Kurt_Bremser
Super User

This is my preferred way of visual coding style:

data new; 
set longsort; 
by id_no; 
keep id_no sympt1 - sympt5 disease; 
retain sympt1 - sympt5 disease; 
array New_a (1:5) $20 sympt1 - sympt5;
disease = 0; 
If first.id_no
then do; 
  do i = 1 to 5; 
    new_a (i) = .; 
  end; 
  new_a (symptom_no) = symptom; 
  if last.id_no then output; 
  if sympt1 ='heartburns' and sympt2='sickness' and sympt3='spasm'
  then disease='1';
  else disease='0'; 
end;
run; 

I removed the surplus end (which sticks out like a beacon when the code is properly formatted) and the erroneous run.

Now you can see that your if last. is within the if first. block, and will only be executed if there's only one row per id_no. I guess that's not what yo wanted.

 

Although I'm famous for my notoriously cluttered desk, my codes are always neat and tidy.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 7 replies
  • 8159 views
  • 3 likes
  • 5 in conversation