New to Sas and I am trying to figure out how to format the data from long to wide.
data muscatine_wide; by id;
set hw7.muscatine;
array y[3] o1 o2 o3;
array age[3] a1 a2 a3;
y[occasion] = obese;
retain o1 -o2;
a[occasion] = age;
if first.id = 1 then delete;
run;
This is the error code I am getting. However, I don't know if the sas code (above) is accomplishing what I am trying to do.
1 libname hw7 '\\apporto.com\dfs\GWU\Users\kennedyhinnant_gwu\Downloads\Homework 7';
NOTE: Libref HW7 was successfully assigned as follows:
Engine: V9
Physical Name: \\apporto.com\dfs\GWU\Users\kennedyhinnant_gwu\Downloads\Homework 7
2 run;
3 data glow_combined;
4 set hw7.glow1
5 hw7.glow2
6 hw7.glow3
7 hw7.glow4
8 hw7.glow5
9 hw7.glow6;
10 run;
NOTE: There were 107 observations read from the data set HW7.GLOW1.
NOTE: There were 90 observations read from the data set HW7.GLOW2.
NOTE: There were 65 observations read from the data set HW7.GLOW3.
NOTE: There were 36 observations read from the data set HW7.GLOW4.
NOTE: There were 120 observations read from the data set HW7.GLOW5.
NOTE: There were 82 observations read from the data set HW7.GLOW6.
NOTE: The data set WORK.GLOW_COMBINED has 500 observations and 9 variables.
NOTE: DATA statement used (Total process time):
real time 0.15 seconds
cpu time 0.03 seconds
11 data glow_combined;
12 set hw7.glow1(in= glow1)
13 hw7.glow2 (in=glow2)
14 hw7.glow3 (in=glow3)
15 hw7.glow4 (in=glow4)
16 hw7.glow5 (in=glow5)
17 hw7.glow6 (in=glow6);
18 if glow1 then site_id = 1;
19 if glow2 then site_id = 2;
20 if glow3 then site_id = 3;
21 if glow4 then site_id = 4;
22 if glow5 then site_id = 5;
23 if glow6 then site_id = 6;
24 run;
NOTE: There were 107 observations read from the data set HW7.GLOW1.
NOTE: There were 90 observations read from the data set HW7.GLOW2.
NOTE: There were 65 observations read from the data set HW7.GLOW3.
NOTE: There were 36 observations read from the data set HW7.GLOW4.
NOTE: There were 120 observations read from the data set HW7.GLOW5.
NOTE: There were 82 observations read from the data set HW7.GLOW6.
NOTE: The data set WORK.GLOW_COMBINED has 500 observations and 10 variables.
NOTE: DATA statement used (Total process time):
real time 0.12 seconds
cpu time 0.03 seconds
25 data glow_cleaned;
26 set glow_combined;
27 array miss[3] age weight height;
28 do i = 1 to 3;
29 if miss [i] = 999 then miss[i] = .;
30 drop i;
31 end;
32 run;
NOTE: There were 500 observations read from the data set WORK.GLOW_COMBINED.
NOTE: The data set WORK.GLOW_CLEANED has 500 observations and 10 variables.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.00 seconds
33 data glow;
34 set glow_cleaned;
35 array miss_char[3] momfrac raterisk priorfrac;
36 do i = 1 to 3;
37 if miss_char[i] in ('NA') then miss_char[i] = " ";
38 end;
39 drop i;
40 run;
NOTE: There were 500 observations read from the data set WORK.GLOW_CLEANED.
NOTE: The data set WORK.GLOW has 500 observations and 10 variables.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds
41 data glow;
42 set glow_cleaned;
43 bmi = weight/((height/100)**2);
44 if age<=65 and (bmi<18 or bmi>25) then age_bmicat=1;
45 else if age<=65 and (18 <= bmi <= 25) then age_bmicat = 2;
46 if age>65 and (bmi<25 or bmi>30)then age_bmicat=3;
47 else if age>65 and (25 <= bmi <= 30) then age_bmicat=4;
48 run;
NOTE: Missing values were generated as a result of performing an operation on missing values.
Each place is given by: (Number of times) at (Line):(Column).
81 at 43:13 100 at 43:22
NOTE: There were 500 observations read from the data set WORK.GLOW_CLEANED.
NOTE: The data set WORK.GLOW has 500 observations and 12 variables.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
49 proc sort data=glow; by descending priorfrac descending fracture;
NOTE: There were 500 observations read from the data set WORK.GLOW.
NOTE: The data set WORK.GLOW has 500 observations and 12 variables.
NOTE: PROCEDURE SORT used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds
50 proc freq data=glow order=data; where priorfrac ne 'NA';
51 table priorfrac*fracture/chisq relrisk;
52 run;
NOTE: Writing HTML Body file: sashtml.htm
NOTE: There were 400 observations read from the data set WORK.GLOW.
WHERE priorfrac not = 'NA';
NOTE: PROCEDURE FREQ used (Total process time):
real time 0.43 seconds
cpu time 0.31 seconds
53 proc logistic data=glow descending; where priorfrac ne 'NA';
54 class priorfrac (param=ref ref="No");
55 model fracture = priorfrac/clodds=wald;
56 run;
NOTE: PROC LOGISTIC is modeling the probability that fracture='1'.
NOTE: Convergence criterion (GCONV=1E-8) satisfied.
NOTE: There were 400 observations read from the data set WORK.GLOW.
WHERE priorfrac not = 'NA';
NOTE: PROCEDURE LOGISTIC used (Total process time):
real time 0.87 seconds
cpu time 0.40 seconds
57 proc logistic data=glow descending;
58 class age_bmicat (param=ref ref='1');
59 model fracture = age_bmicat/clodds=wald;
NOTE: PROC LOGISTIC is modeling the probability that fracture='1'.
NOTE: Convergence criterion (GCONV=1E-8) satisfied.
NOTE: There were 500 observations read from the data set WORK.GLOW.
NOTE: PROCEDURE LOGISTIC used (Total process time):
real time 41.10 seconds
cpu time 1.70 seconds
60 data muscatine_wide(keep=(id o1 o2 o3 a1 a2 a3 gender));
-
214
23
ERROR 214-322: Variable name ( is not valid.
ERROR 23-7: Invalid value for the KEEP option.
61 by id;
ERROR: No SET, MERGE, UPDATE, or MODIFY statement is present.
62 set hw7.muscatine;
63 array y[3] o1 o2 o3;
-
124
ERROR 124-185: The variable y has already been defined.
64 array age[3] a1 a2 a3;
---
124
ERROR 124-185: The variable age has already been defined.
65 y[occasion] = obese;
ERROR: Undeclared array referenced: Y.
ERROR: Variable Y has not been declared as an array.
66 retain o1 -o2;
67 a[occasion] = age;
ERROR: Undeclared array referenced: a.
ERROR: Variable a has not been declared as an array.
68 if first.id = 1 then delete;
69 run;
NOTE: The SAS System stopped processing this step because of errors.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.00 seconds
The log tells you the first things that are wrong. Fix them and repost the log.
60 data muscatine_wide(keep=(id o1 o2 o3 a1 a2 a3 gender)); - 214 23 ERROR 214-322: Variable name ( is not valid. ERROR 23-7: Invalid value for the KEEP option. 61 by id; ERROR: No SET, MERGE, UPDATE, or MODIFY statement is present. 62 set hw7.muscatine;
Without the raw data it's hard to see if the code is correct but you're on the right path with some missing components.
Transposing data tutorials:
Long to Wide:
https://stats.idre.ucla.edu/sas/modules/how-to-reshape-data-long-to-wide-using-proc-transpose/
https://stats.idre.ucla.edu/sas/modules/reshaping-data-long-to-wide-using-the-data-step/
What do you intend to do with the wide data? If it's for display, PROC REPORT will be the way to go.
There are few benefits and lots of drawbacks to converting a data set from long to wide. Without a good reason, I would avoid doing this.
Especially if you want a report, then do as @Kurt_Bremser says and use PROC REPORT, and then you don't have to convert the data from long to wide (and by avoiding doing this work, you avoid one of the drawbacks of converting long to wide). Example: https://communities.sas.com/t5/SAS-Programming/Proc-transpose-with-multiple-var/m-p/846310#M334585 Example 2: https://communities.sas.com/t5/ODS-and-Base-Reporting/format-all-column-names-that-are-similar/m-p/6...
I have to do this for a class assignment.
So the whole issue is transposing the muscatine dataset. Please supply example data for this dataset in a data step with datalines, and the expected output.
@MisterJenn wrote:
I have to do this for a class assignment.
So, what does the assignment ask you to do with this wide data set, once you create it??
Your teacher is teaching you an inefficient way to do things. Please remember that long data sets are easier to work than wide data sets when the class ends and you have to do this for an actual job.
@MisterJenn wrote:
We are using this data set to conduct a logistic regression analysis
Okay, you still haven't shown us the data set, but this COULD be (but still might not be) one of the few good reasons to convert long into wide.
Please see if this link is what you're trying to do:
https://stats.oarc.ucla.edu/sas/modules/how-to-reshape-data-long-to-wide-using-proc-transpose/
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.