BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
jessica_join
Obsidian | Level 7

The Excel data file given to you has 18,082 observations with 3 variables: ID_NO, SYMPTOM_NO, and SYMPTOM.

 

The first 20 observations of the excel file look like this

id_no

symptom_no

symptom

1

1

Heartburns

1

2

Sickness

4

5

Tiredness

6

1

Heartburns

7

4

Temperature

8

1

Heartburns

8

5

Tiredness

9

2

Sickness

9

3

Spasm

9

4

Temperature

10

3

Spasm

10

5

Tiredness

11

3

Spasm

11

5

Tiredness

12

4

Temperature

12

5

Tiredness

13

3

Spasm

13

4

Temperature

14

3

Spasm

15

1

Heartburns

 

You must use an ARRAY statement along with FIRST.ID and LAST.ID to reorganize the given set so that instead of having multiple records per person, there should be one record per person and a variable for each possible symptom (see below): Following is the first 15 records of the newly organized data set.

Obs

id_no

sympt1

sympt2

sympt3

sympt4

sympt5

disease

1

1

Heartburns

Sickness

 

 

 

0

2

4

 

 

 

 

Tiredness

0

3

6

Heartburns

 

 

 

 

0

4

7

 

 

 

Temperature

 

0

5

8

Heartburns

 

 

 

Tiredness

0

6

9

 

Sickness

Spasm

Temperature

 

0

7

10

 

 

Spasm

 

Tiredness

0

8

11

 

 

Spasm

 

Tiredness

0

9

12

 

 

 

Temperature

Tiredness

0

10

13

 

 

Spasm

Temperature

 

0

11

14

 

 

Spasm

 

 

0

12

15

Heartburns

 

 

Temperature

 

0

13

16

 

 

 

 

Tiredness

0

14

17

Heartburns

 

 

 

 

0

15

19

 

Sickness

 

 

 

0

 

 

 

I cannot figure this out. This is the code I have....

 

%macro one (v1,v2); 
proc import out  = &v1
            datafile = "\\Client\C$\Users\jessica\Desktop\data\&v2"
            DBMS     = xlsx replace;
            getnames = YES;
run;


%mend one; 
%one (Project3, Project3_f17);

data new; 
	array new[7] id_no sympt1-sympt5;
	retain id_no sympt_no; 
	set Project3; 

by id; 

if first.id then do i=1 to 12549; 
	new[i]= .;
	end; 

	agency [disease] = disease; 
	if last.id then output; 
format sympt1-sympt5 disease.; 
keep id sympt1-sympt5; 
run; 
proc print data= new; 
 run; 

 

Please help!!!

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

SAS will define type and length as soon as it sees that you are using it for something.  In your current code you first define the SYMPTxx variables in the ARRAY statement.  Since you did not tell how to define the variables it made them a numeric.  You can dd a length specification to the ARRAY statement.  For example if you want them to be define as character variables with a length of 20 you could code it as.

  array New_a (5) $20 sympt1 - sympt5;

But the best way is to NOT depend on SAS guessing how you want to define a variable. Instead define it yourself using a LENGTH statement before you start using it.

 

length sympt1 - sympt5 $20;
array New_a sympt1 - sympt5;

What does the DISEASE variable represent? If you want to set it to zero everywhere then just add an assignment statement.

disease = 0;

Make sure it is not dropped.

 

View solution in original post

17 REPLIES 17
WarrenKuhfeld
Rhodochrosite | Level 12

You did not provide enough information for us to check what you are doing, but I did notice one thing.  Add these to your RETAIN statement.

sympt1-sympt5

 Your code would be easier to look at if you indented it nicely.

jessica_join
Obsidian | Level 7
It appears indented to me. Perhaps it did not come through that way on your end?
Thank you!
Reeza
Super User

1. Using a macro to import is not very helpful in the long run, its just adding code. 

2. You're really really close. 


Here's a full walk through on how to do this:

https://stats.idre.ucla.edu/sas/modules/reshaping-data-long-to-wide-using-the-data-step/

 

You're missing the assignment statement to assign the values based on the count. 

 

new[symptom_no] = symptom;

And you're also missing the variables you're going to keep in the RETAIN statement, symptom1-symptom5

Tom
Super User Tom
Super User

Why would you want to convert it?  If you do then use PROC TRANSPOSE.

proc transpose data=have out=want prefix=sympt ;
  by id_no;
  id symptom_no;
  var symptom ;
run;

Is this a homework problem?

If so

Why did you include ID_NO in your ARRAY?

Where did the variable ID and DISEASE come from?

Do you know how to use the CALL MISSING() function?
Have you heard of DOW loops?  Google it.

 

 

jessica_join
Obsidian | Level 7
It is not a homework problem. It is a study guide question. I thought I had to include ID_NO for it to show up in the overall solution. Disease is a variable he wants us to create. No I do not know how to use the Call Missing() function. Thank you.
Tom
Super User Tom
Super User

Let's look at what you asked SAS to do.

In the first three lines:

data new; 
  array new[7] id_no sympt1-sympt5;
  retain id_no sympt_no; 

You are making a dataset named NEW. 

You defined a array named NEW with 7 elements, but you only listed 6 variables. That should cause an error. Note you don't need to include the [7] if you are listing the actual variable names. SAS can count how many variables you have listed and will define the array to have that many members.

Since this is the first place these variables are mentioned they will all 6 be numeric variables. 

You then marked two variables to be retained across loops. Since you have not mentioned SYMPT_NO before so its type is still not known.  I suspect that you really want to retain are the SYMPT..... variables that you are going to use in your ARRAY.

 

Next you do this:

  set Project3; 
  by id; 

So now you are bring in a dataset named PROJECT3.  So SAS will now know about the variables that are in that dataset. Notice that if any of the variables you have already defined as numeric are in that dataset as character variables you will get an error.

You are assuming that it has a variable named ID and that it is sorted by that variable.

 

  if first.id then do i=1 to 12549; 
    new[i]= .;
  end; 

Now if the current observation is the first for this value of ID you are trying to set 12,549 members of the array NEW to missing. But you only defined the array to have 7 (or 6?) members. So that will generate a run time error when the loop counter I reaches 8.

 

  agency [disease] = disease; 

Now you are referencing an array named AGENCY that you have not defined. That will cause an error when SAS tries to compile this data step.  The logic is wrong since you are just setting the value to the index number.  That is like saying X[1]=1, or X[2]=2.  Probably not what you want.

You using the variable DISEASE, that you have not defined,  so a programmer reading this would assume came from the input dataset. SAS itself will know since it will have looked at the input dataset. So if DISEASE wasn't defined there then it will be created as numeric variable and have a missing value. If it was defined and it is a character variable then you will get another error because the index into an array reference has to be an integer.  If it is missing then (if it ever could run) it would cause an error at run time since missing value is not a valid index into an array.

 

Finally

  if last.id then output; 
  format sympt1-sympt5 disease.; 
  keep id sympt1-sympt5; 
run;

You will only output one observation per value of the BY variable ID. 

You are attaching a format to the variables SYMT1-SYMPT5.  Syntactically this statement looks fine, but you have not defined the format DISEASE. anywhere in this code. 

And you are telling SAS that only 6 variables should be kept in the output dataset. So other variables like I, ID_NO, SYMPTOM_NO and DISEASE that your data step mentions will be dropped.

 

jessica_join
Obsidian | Level 7

Okay thank you for breaking it down. 

 

I just changed it to this...

data new; 
	set longsort; 
	by id_no; 
	Keep id_no sympt1 - sympt5; 
	retain sympt1 - sympt5; 
	array New_a (1:5) sympt1 - sympt5; 
	If first.id_no then
	do; 
	Do i = 1 to 5; 
		new_a (i) = .; 
		end; 
	end; 
	new_a (symptom_no) = symptom; 
	if last.symptom then output; 
		run; 
	proc print data= new; 
	run; 

The error code I am getting...

NOTE: Character values have been converted to numeric values at the places given by:
(Line):(Column).
202:5
NOTE: Variable last.symptom is uninitialized.
NOTE: Invalid numeric data, symptom='Heartburns' , at line 202 column 26.
id_no=1 symptom_no=1 symptom=Heartburns FIRST.id_no=1 LAST.id_no=0 sympt1=. sympt2=. sympt3=.
sympt4=. sympt5=. i=6 last.symptom=0 _ERROR_=1 _N_=1
NOTE: Invalid numeric data, symptom='Sickness' , at line 202 column 26.
id_no=1 symptom_no=2 symptom=Sickness FIRST.id_no=0 LAST.id_no=1 sympt1=. sympt2=. sympt3=.
sympt4=. sympt5=. i=. last.symptom=0 _ERROR_=1 _N_=2
NOTE: Invalid numeric data, symptom='Tiredness' , at line 202 column 26.
id_no=4 symptom_no=5 symptom=Tiredness FIRST.id_no=1 LAST.id_no=1 sympt1=. sympt2=. sympt3=.
sympt4=. sympt5=. i=6 last.symptom=0 _ERROR_=1 _N_=3
NOTE: Invalid numeric data, symptom='Heartburns' , at line 202 column 26.
id_no=6 symptom_no=1 symptom=Heartburns FIRST.id_no=1 LAST.id_no=1 sympt1=. sympt2=. sympt3=.
sympt4=. sympt5=. i=6 last.symptom=0 _ERROR_=1 _N_=4
NOTE: Invalid numeric data, symptom='Temperature' , at line 202 column 26.
id_no=7 symptom_no=4 symptom=Temperature FIRST.id_no=1 LAST.id_no=1 sympt1=. sympt2=. sympt3=.
sympt4=. sympt5=. i=6 last.symptom=0 _ERROR_=1 _N_=5
NOTE: Invalid numeric data, symptom='Heartburns' , at line 202 column 26.
id_no=8 symptom_no=1 symptom=Heartburns FIRST.id_no=1 LAST.id_no=0 sympt1=. sympt2=. sympt3=.
sympt4=. sympt5=. i=6 last.symptom=0 _ERROR_=1 _N_=6
NOTE: Invalid numeric data, symptom='Tiredness' , at line 202 column 26.
id_no=8 symptom_no=5 symptom=Tiredness FIRST.id_no=0 LAST.id_no=1 sympt1=. sympt2=. sympt3=.
sympt4=. sympt5=. i=. last.symptom=0 _ERROR_=1 _N_=7
NOTE: Invalid numeric data, symptom='Sickness' , at line 202 column 26.
id_no=9 symptom_no=2 symptom=Sickness FIRST.id_no=1 LAST.id_no=0 sympt1=. sympt2=. sympt3=.
sympt4=. sympt5=. i=6 last.symptom=0 _ERROR_=1 _N_=8
NOTE: Invalid numeric data, symptom='Spasm' , at line 202 column 26.
id_no=9 symptom_no=3 symptom=Spasm FIRST.id_no=0 LAST.id_no=0 sympt1=. sympt2=. sympt3=. sympt4=.
sympt5=. i=. last.symptom=0 _ERROR_=1 _N_=9
NOTE: Invalid numeric data, symptom='Temperature' , at line 202 column 26.
id_no=9 symptom_no=4 symptom=Temperature FIRST.id_no=0 LAST.id_no=1 sympt1=. sympt2=. sympt3=.
sympt4=. sympt5=. i=. last.symptom=0 _ERROR_=1 _N_=10
NOTE: Invalid numeric data, symptom='Spasm' , at line 202 column 26.
id_no=10 symptom_no=3 symptom=Spasm FIRST.id_no=1 LAST.id_no=0 sympt1=. sympt2=. sympt3=. sympt4=.
sympt5=. i=6 last.symptom=0 _ERROR_=1 _N_=11
NOTE: Invalid numeric data, symptom='Tiredness' , at line 202 column 26.
id_no=10 symptom_no=5 symptom=Tiredness FIRST.id_no=0 LAST.id_no=1 sympt1=. sympt2=. sympt3=.
sympt4=. sympt5=. i=. last.symptom=0 _ERROR_=1 _N_=12
NOTE: Invalid numeric data, symptom='Spasm' , at line 202 column 26.
id_no=11 symptom_no=3 symptom=Spasm FIRST.id_no=1 LAST.id_no=0 sympt1=. sympt2=. sympt3=. sympt4=.
sympt5=. i=6 last.symptom=0 _ERROR_=1 _N_=13
NOTE: Invalid numeric data, symptom='Tiredness' , at line 202 column 26.
id_no=11 symptom_no=5 symptom=Tiredness FIRST.id_no=0 LAST.id_no=1 sympt1=. sympt2=. sympt3=.
sympt4=. sympt5=. i=. last.symptom=0 _ERROR_=1 _N_=14
NOTE: Invalid numeric data, symptom='Temperature' , at line 202 column 26.
id_no=12 symptom_no=4 symptom=Temperature FIRST.id_no=1 LAST.id_no=0 sympt1=. sympt2=. sympt3=.
sympt4=. sympt5=. i=6 last.symptom=0 _ERROR_=1 _N_=15
NOTE: Invalid numeric data, symptom='Tiredness' , at line 202 column 26.
id_no=12 symptom_no=5 symptom=Tiredness FIRST.id_no=0 LAST.id_no=1 sympt1=. sympt2=. sympt3=.
sympt4=. sympt5=. i=. last.symptom=0 _ERROR_=1 _N_=16
NOTE: Invalid numeric data, symptom='Spasm' , at line 202 column 26.
id_no=13 symptom_no=3 symptom=Spasm FIRST.id_no=1 LAST.id_no=0 sympt1=. sympt2=. sympt3=. sympt4=.
sympt5=. i=6 last.symptom=0 _ERROR_=1 _N_=17
NOTE: Invalid numeric data, symptom='Temperature' , at line 202 column 26.
id_no=13 symptom_no=4 symptom=Temperature FIRST.id_no=0 LAST.id_no=1 sympt1=. sympt2=. sympt3=.
sympt4=. sympt5=. i=. last.symptom=0 _ERROR_=1 _N_=18
NOTE: Invalid numeric data, symptom='Spasm' , at line 202 column 26.
id_no=14 symptom_no=3 symptom=Spasm FIRST.id_no=1 LAST.id_no=1 sympt1=. sympt2=. sympt3=. sympt4=.
sympt5=. i=6 last.symptom=0 _ERROR_=1 _N_=19
NOTE: Invalid numeric data, symptom='Heartburns' , at line 202 column 26.
WARNING: Limit set by ERRORS= option reached. Further errors of this type will not be printed.
id_no=15 symptom_no=1 symptom=Heartburns FIRST.id_no=1 LAST.id_no=0 sympt1=. sympt2=. sympt3=.
sympt4=. sympt5=. i=6 last.symptom=0 _ERROR_=1 _N_=20
NOTE: There were 18082 observations read from the data set WORK.LONGSORT.
NOTE: The data set WORK.NEW has 0 observations and 6 variables.
NOTE: DATA statement used (Total process time):
real time 0.12 seconds
cpu time 0.07 seconds


205 proc print data= new;
206 run;

NOTE: No observations in data set WORK.NEW.
NOTE: PROCEDURE PRINT used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds

 

 

I realize my issue is that I need to indicate that for sympt1-5 they are character variables but I do not know how to do this. 

 

Also, How do I create the new variable for disease that is in the new chart?

jessica_join
Obsidian | Level 7

UPDATE

data new;
  set longsort;
  by id_no;
  Keep id_no sympt1 - sympt5;
  retain sympt1 - sympt5;
  array New_a (1:5) sympt1 - sympt5;
  If first.id_no then do;
    Do i = 1 to 5;
      new_a (i) = .;
    end;
  end;
  new_a (symptom_no) = symptom;
  if last.symptom then output;
run;
proc print data= new; 
run;

I have this now but I need help defining sympt1-5 as character variables while ID is numeric.

 

Also, how do I add disease as a variable as shown in the outcome I am supposed to have. 

Tom
Super User Tom
Super User

SAS will define type and length as soon as it sees that you are using it for something.  In your current code you first define the SYMPTxx variables in the ARRAY statement.  Since you did not tell how to define the variables it made them a numeric.  You can dd a length specification to the ARRAY statement.  For example if you want them to be define as character variables with a length of 20 you could code it as.

  array New_a (5) $20 sympt1 - sympt5;

But the best way is to NOT depend on SAS guessing how you want to define a variable. Instead define it yourself using a LENGTH statement before you start using it.

 

length sympt1 - sympt5 $20;
array New_a sympt1 - sympt5;

What does the DISEASE variable represent? If you want to set it to zero everywhere then just add an assignment statement.

disease = 0;

Make sure it is not dropped.

 

jessica_join
Obsidian | Level 7

Thank you so much with all your help. I am a sas newbie and this class has been way too much for me. I want to make sure I understand this before my exam!

jessica_join
Obsidian | Level 7
data new; 
	set longsort; 
	by id_no; 
	Keep id_no sympt1 - sympt5 disease; 
	retain sympt1 - sympt5; 
	array New_a (1:5) $ 20 sympt1 - sympt5; 
	If first.id_no then
	do; 
	Do i = 1 to 5; 
		new_a (i) = .; 
		disease= 0; 
		end; 
	end; 
	new_a (symptom_no) = symptom; 
	if last.id then output; 
		run; 
	proc print data= new; 
	run; 

It is saying 0 observations

Reeza
Super User

Please include the full log. 

jessica_join
Obsidian | Level 7
%macro one (v1,v2); 
proc import out  = &v1
            datafile = "\\Client\C$\Users\brian\Desktop\data\&v2"
            DBMS     = xlsx replace;
            getnames = YES;
run;


%mend one; 
%one (Project3, Project3_f17);
 proc format; 
value symptom_no	1= "heartburns" 
					2= "Sickness"
					3= "Spasm"
					4= "Temperature"
					5= "Tiredness"; 
		
proc sort data=Project3 out= longsort; 
 	by id_no; 
run; 

data new; 
	set longsort; 
	by id_no; 
	Keep id_no sympt1 - sympt5 disease; 
	retain sympt1 - sympt5; 
	array New_a (1:5) $ 20 sympt1 - sympt5; 
	If first.id_no then
	do; 
	Do i = 1 to 5; 
		new_a (i) = .; 
		disease= 0; 
		end; 
	end; 
	new_a (symptom_no) = symptom; 
	if last.id then output; 
		run; 
	proc print data= new; 
	run; 

NOTE: The import data set has 18082 observations and 3 variables.
NOTE: WORK.PROJECT3 data set was successfully created.
NOTE: PROCEDURE IMPORT used (Total process time):
real time 8.72 seconds
cpu time 0.70 seconds


358 proc format;
359 value symptom_no 1= "heartburns"
360 2= "Sickness"
361 3= "Spasm"
362 4= "Temperature"
363 5= "Tiredness";
NOTE: Format SYMPTOM_NO is already on the library WORK.FORMATS.
NOTE: Format SYMPTOM_NO has been output.
364

NOTE: PROCEDURE FORMAT used (Total process time):
real time 0.00 seconds
cpu time 0.01 seconds


365 proc sort data=Project3 out= longsort;
366 by id_no;
367 run;

NOTE: There were 18082 observations read from the data set WORK.PROJECT3.
NOTE: The data set WORK.LONGSORT has 18082 observations and 3 variables.
NOTE: PROCEDURE SORT used (Total process time):
real time 0.02 seconds
cpu time 0.03 seconds


368
369 data new;
370 set longsort;
371 by id_no;
372 Keep id_no sympt1 - sympt5 disease;
373 retain sympt1 - sympt5;
374 array New_a (1:5) $ 20 sympt1 - sympt5;
375 If first.id_no then
376 do;
377 Do i = 1 to 5;
378 new_a (i) = .;
379 disease= 0;
380 end;
381 end;
382 new_a (symptom_no) = symptom;
383 if last.id then output;
384 run;

NOTE: Numeric values have been converted to character values at the places given by:
(Line):(Column).
378:9
NOTE: Variable last.id is uninitialized.
NOTE: There were 18082 observations read from the data set WORK.LONGSORT.
NOTE: The data set WORK.NEW has 0 observations and 7 variables.
NOTE: DATA statement used (Total process time):
real time 0.04 seconds
cpu time 0.04 seconds


385 proc print data= new;
386 run;

NOTE: No observations in data set WORK.NEW.
NOTE: PROCEDURE PRINT used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds

 

Reeza
Super User

Look at the first two notes. 

 

You have a typo, first.id rather than first.id_no

And I think the character array should be $20 -> no space. 


The first note about conversion is saying that it still thinks the array is numeric for some reason, note the number in the error that can be linked to the line itn the log.  

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 17 replies
  • 3012 views
  • 5 likes
  • 4 in conversation