BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
aespinarey
Obsidian | Level 7

Hi all!

I am having trouble using array, first., and last. to create only one observation and multiple variables per subject. The data set has 18,082 observations with 3 variables: ID_NO, SYMPTOM_NO, and SYMPTOM. I need to keep the id_no variable and lose the symptom_no and symptom variables yet create 5 variables called sympt1 (for heartburns), sympt2 (for sickness), sympt3 (for spasm), sympt4 (for temperature), and sympt5 (for tiredness), and one constant variable disease (which is always = 0)

Data set given with the 1st 15 obs:

id_no

symptom_no

symptom

1

1

Heartburns

1

2

Sickness

4

5

Tiredness

6

1

Heartburns

7

4

Temperature

8

1

Heartburns

8

5

Tiredness

9

2

Sickness

9

3

Spasm

9

4

Temperature

10

3

Spasm

10

5

Tiredness

11

3

Spasm

11

5

Tiredness

12

4

Temperature

12

5

Tiredness

13

3

Spasm

13

4

Temperature

14

3

Spasm

15

1

Heartburns

Goal set up:

Obs

id_no

sympt1

sympt2

sympt3

sympt4

sympt5

disease

1

1

Heartburns

Sickness

 

 

 

0

2

4

 

 

 

 

Tiredness

0

3

6

Heartburns

 

 

 

 

0

4

7

 

 

 

Temperature

 

0

5

8

Heartburns

 

 

 

Tiredness

0

6

9

 

Sickness

Spasm

Temperature

 

0

7

10

 

 

Spasm

 

Tiredness

0

8

11

 

 

Spasm

 

Tiredness

0

9

12

 

 

 

Temperature

Tiredness

0

10

13

 

 

Spasm

Temperature

 

0

11

14

 

 

Spasm

 

 

0

12

15

Heartburns

 

 

Temperature

 

0

13

16

 

 

 

 

Tiredness

0

14

17

Heartburns

 

 

 

 

0

15

19

 

Sickness

 

 

 

0

 

 

 

Here is the code i am using and the log error message.

****1.IMPORT;

options nodate nonumber;
%macro P3 (a, b, c);
proc import out= &a
datafile= "C:\HW5\&b"
dbms=xlsx replace;
getnames=yes;
run;
proc sort data=&a;
by &c;
run;
Proc print data = &a;
Run;
%mend P3;

%P3 (PROJECT3_F17, Project3.xlsx, id_no);
***2. USE ARRAY STATEMENT ALONG WITH FIRST.ID and LAST.ID to reorganize;
data sympt;
set PROJECT3_F17;
retain id_no sympt_no sympt1-sympt5;
length sympt1-sympt5 $20;
array symptoms (5) sympt1-sympt5;
by symptom_no;
do i=1 to 5;
if first.symptom_no then symptoms(i)=sympt1;
else symptoms (i)=.;

end;
drop symptom symptom_no;
if last.symptom_no then output;
disease=0;
format id_no sympt1-sympt5 disease;
run;
proc print data=sympt;
run;

 

LOG:

 

127055 options nodate nonumber;
127056 ****1.IMPORT;
127057 %macro P3 (a, b, c);
127058 proc import out= &a
127059 datafile= "C:\HW5\&b"
127060 dbms=xlsx replace;
127061 getnames=yes;
127062 run;
127063 proc sort data=&a;
127064 by &c;
127065 run;
127066 Proc print data = &a;
127067 Run;
127068 %mend P3;
127069
127070 %P3 (PROJECT3_F17, Project3.xlsx, id_no);

NOTE: The import data set has 18082 observations and 3 variables.
NOTE: WORK.PROJECT3_F17 data set was successfully created.
NOTE: PROCEDURE IMPORT used (Total process time):
real time 0.91 seconds
cpu time 0.79 seconds

 

NOTE: There were 18082 observations read from the data set WORK.PROJECT3_F17.
NOTE: The data set WORK.PROJECT3_F17 has 18082 observations and 3 variables.
NOTE: PROCEDURE SORT used (Total process time):
real time 0.04 seconds
cpu time 0.04 seconds

 

NOTE: There were 18082 observations read from the data set WORK.PROJECT3_F17.
NOTE: PROCEDURE PRINT used (Total process time):
real time 3.11 seconds
cpu time 2.73 seconds


127071 ***2.REORGANIZE variables: USE ARRAY STATEMENT ALONG WITH FIRST.ID and LAST.ID;
127072 data sympt;
127073 set PROJECT3_F17;
127074 retain id_no sympt_no sympt1-sympt5;
127075 length sympt1-sympt5 $20;
127076 array symptoms (5) sympt1-sympt5;
127077 *array v (5) symptom_no;
127078 by symptom_no;
127079 do i=1 to 5;
127080 if first.symptom_no then symptoms(i)=sympt1;
127081 *else symptoms (i) = .;
127082 else symptoms (i)=.;
127083
127084 end;
127085 drop symptom symptom_no;
127086 if last.symptom_no then output;
127087 *if symptoms (i) = Heartburns then symptoms (i)= sympt1;
127088 *if symptoms (i) = Sickness then symptoms (i)= sympt2;
127089 *if symptoms (i) = Spasm then symptoms (i)= sympt3;
127090 *if symptoms (i) = Temperature then symptoms (i)= sympt4;
127091 *if symptoms (i) = Tiredness then symptoms (i)= sympt5;
127092 *else symptoms (i) = .;
127093 *end;
127094 disease=0;
127095 format id_no sympt1-sympt5 disease;
127096 *data sympt2;
127097 *set sympt;
127098 run;

NOTE: Numeric values have been converted to character values at the places given by:
(Line):(Column).
127082:14
NOTE: Variable sympt_no is uninitialized.
ERROR: BY variables are not properly sorted on data set WORK.PROJECT3_F17.
id_no=4 symptom_no=5 symptom=Tiredness sympt1= sympt2= sympt3= sympt4= sympt5=
FIRST.symptom_no=1 LAST.symptom_no=1 i=. disease=. _ERROR_=1 _N_=3
NOTE: The SAS System stopped processing this step because of errors.
NOTE: There were 4 observations read from the data set WORK.PROJECT3_F17.
WARNING: The data set WORK.SYMPT may be incomplete. When this step was stopped there were 2
observations and 8 variables.
WARNING: Data set WORK.SYMPT was not replaced because this step was stopped.
NOTE: DATA statement used (Total process time):
real time 0.46 seconds
cpu time 0.09 seconds


127099 proc print data=sympt;
127100 run;

NOTE: No observations in data set WORK.SYMPT.
NOTE: PROCEDURE PRINT used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds

 

Please help!

1 ACCEPTED SOLUTION

Accepted Solutions
art297
Opal | Level 21

I'd do it as follows:

data sympt (drop=symptom_no symptom);
  set PROJECT3_F17;
  retain sympt1-sympt5;
  retain disease 0;
  length sympt1-sympt5 $15;
  array symptoms (5) $15. sympt1-sympt5;
  by id_no;
  if first.id_no then call missing(of symptoms(*));
  symptoms(symptom_no)=symptom;
  if last.id_no then output;
run;

Art, CEO, AnalystFinder.com

 

View solution in original post

5 REPLIES 5
novinosrin
Tourmaline | Level 20

looking only at your sample, makes me want to do it this way:

data have;
input id_no	symptom_no	symptom :$15.;
datalines;
1	1	Heartburns
1	2	Sickness
4	5	Tiredness
6	1	Heartburns
7	4	Temperature
8	1	Heartburns
8	5	Tiredness
9	2	Sickness
9	3	Spasm
9	4	Temperature
10	3	Spasm
10	5	Tiredness
11	3	Spasm
11	5	Tiredness
12	4	Temperature
12	5	Tiredness
13	3	Spasm
13	4	Temperature
14	3	Spasm
15	1	Heartburns
;

proc transpose data=have out=want(drop=_name_) ;
by id_no;
var symptom;
id symptom;
run;

data final_want;
set want;
disease=0;
run;

kiranv_
Rhodochrosite | Level 12

if you have few variables then one way to do this is 

proc sql;
create table want as
select id_no,
max(case when symptom_no = 1 then symptom end) as sympt1,
max(case when symptom_no = 2 then symptom end) as sympt2,
max(case when symptom_no = 3 then symptom end) as sympt3,
max(case when symptom_no = 4 then symptom end) as sympt4,
max(case when symptom_no = 5 then symptom end) as sympt5,
"0" as Disease
from abc
group by id_no;
art297
Opal | Level 21

I'd do it as follows:

data sympt (drop=symptom_no symptom);
  set PROJECT3_F17;
  retain sympt1-sympt5;
  retain disease 0;
  length sympt1-sympt5 $15;
  array symptoms (5) $15. sympt1-sympt5;
  by id_no;
  if first.id_no then call missing(of symptoms(*));
  symptoms(symptom_no)=symptom;
  if last.id_no then output;
run;

Art, CEO, AnalystFinder.com

 

novinosrin
Tourmaline | Level 20

Good afternoon @art297 I did think of your approach however, should the number of symptoms changes, you are gonna have somebody in prod support team to change to the array subsript right? or perhaps

 

would you take a max(count distinct symptoms) into array_num and call it as array subscript to make it dynamic?

 

This is the only concern. Not a big deal but just in case

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 2710 views
  • 5 likes
  • 4 in conversation