BookmarkSubscribeRSS Feed
MisterJenn
Fluorite | Level 6

New to Sas and I am trying to figure out how to format the data from long to wide. 

 

data muscatine_wide; by id; 
set hw7.muscatine; 
array y[3] o1 o2 o3;
array age[3] a1 a2 a3;
y[occasion] = obese;
retain o1 -o2; 
a[occasion] = age;
if first.id = 1 then delete;
run;

This is the error code I am getting. However, I don't know if the sas code (above) is accomplishing what I am trying to do. 

 

1    libname hw7 '\\apporto.com\dfs\GWU\Users\kennedyhinnant_gwu\Downloads\Homework 7';
NOTE: Libref HW7 was successfully assigned as follows:
      Engine:        V9
      Physical Name: \\apporto.com\dfs\GWU\Users\kennedyhinnant_gwu\Downloads\Homework 7
2    run;
3    data glow_combined;
4    set hw7.glow1
5    hw7.glow2
6    hw7.glow3
7    hw7.glow4
8    hw7.glow5
9    hw7.glow6;
10   run;

NOTE: There were 107 observations read from the data set HW7.GLOW1.
NOTE: There were 90 observations read from the data set HW7.GLOW2.
NOTE: There were 65 observations read from the data set HW7.GLOW3.
NOTE: There were 36 observations read from the data set HW7.GLOW4.
NOTE: There were 120 observations read from the data set HW7.GLOW5.
NOTE: There were 82 observations read from the data set HW7.GLOW6.
NOTE: The data set WORK.GLOW_COMBINED has 500 observations and 9 variables.
NOTE: DATA statement used (Total process time):
      real time           0.15 seconds
      cpu time            0.03 seconds


11   data glow_combined;
12   set hw7.glow1(in= glow1)
13   hw7.glow2 (in=glow2)
14   hw7.glow3 (in=glow3)
15   hw7.glow4 (in=glow4)
16   hw7.glow5 (in=glow5)
17   hw7.glow6 (in=glow6);
18   if glow1 then site_id  = 1;
19   if glow2 then site_id  = 2;
20   if glow3 then site_id = 3;
21   if glow4 then site_id = 4;
22   if glow5 then site_id = 5;
23   if glow6 then site_id = 6;
24   run;

NOTE: There were 107 observations read from the data set HW7.GLOW1.
NOTE: There were 90 observations read from the data set HW7.GLOW2.
NOTE: There were 65 observations read from the data set HW7.GLOW3.
NOTE: There were 36 observations read from the data set HW7.GLOW4.
NOTE: There were 120 observations read from the data set HW7.GLOW5.
NOTE: There were 82 observations read from the data set HW7.GLOW6.
NOTE: The data set WORK.GLOW_COMBINED has 500 observations and 10 variables.
NOTE: DATA statement used (Total process time):
      real time           0.12 seconds
      cpu time            0.03 seconds


25   data glow_cleaned;
26   set glow_combined;
27   array miss[3] age weight height;
28   do i = 1 to 3;
29       if miss [i] = 999 then miss[i] = .;
30       drop i;
31   end;
32   run;

NOTE: There were 500 observations read from the data set WORK.GLOW_COMBINED.
NOTE: The data set WORK.GLOW_CLEANED has 500 observations and 10 variables.
NOTE: DATA statement used (Total process time):
      real time           0.01 seconds
      cpu time            0.00 seconds


33   data glow;
34   set glow_cleaned;
35   array miss_char[3] momfrac raterisk priorfrac;
36   do i = 1 to 3;
37    if miss_char[i] in ('NA') then miss_char[i] = " ";
38    end;
39    drop i;
40    run;

NOTE: There were 500 observations read from the data set WORK.GLOW_CLEANED.
NOTE: The data set WORK.GLOW has 500 observations and 10 variables.
NOTE: DATA statement used (Total process time):
      real time           0.01 seconds
      cpu time            0.01 seconds


41   data glow;
42   set glow_cleaned;
43   bmi = weight/((height/100)**2);
44   if age<=65 and (bmi<18 or bmi>25) then age_bmicat=1;
45   else if age<=65 and (18 <= bmi <= 25) then age_bmicat = 2;
46   if age>65 and (bmi<25 or bmi>30)then age_bmicat=3;
47   else if age>65 and (25 <= bmi <= 30) then age_bmicat=4;
48   run;

NOTE: Missing values were generated as a result of performing an operation on missing values.
      Each place is given by: (Number of times) at (Line):(Column).
      81 at 43:13    100 at 43:22
NOTE: There were 500 observations read from the data set WORK.GLOW_CLEANED.
NOTE: The data set WORK.GLOW has 500 observations and 12 variables.
NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds


49   proc sort data=glow; by descending priorfrac descending fracture;

NOTE: There were 500 observations read from the data set WORK.GLOW.
NOTE: The data set WORK.GLOW has 500 observations and 12 variables.
NOTE: PROCEDURE SORT used (Total process time):
      real time           0.01 seconds
      cpu time            0.01 seconds


50   proc freq data=glow order=data; where priorfrac ne 'NA';
51   table priorfrac*fracture/chisq relrisk;
52   run;

NOTE: Writing HTML Body file: sashtml.htm
NOTE: There were 400 observations read from the data set WORK.GLOW.
      WHERE priorfrac not = 'NA';
NOTE: PROCEDURE FREQ used (Total process time):
      real time           0.43 seconds
      cpu time            0.31 seconds


53   proc logistic data=glow descending; where priorfrac ne 'NA';
54   class priorfrac (param=ref ref="No");
55   model fracture = priorfrac/clodds=wald;
56   run;

NOTE: PROC LOGISTIC is modeling the probability that fracture='1'.
NOTE: Convergence criterion (GCONV=1E-8) satisfied.
NOTE: There were 400 observations read from the data set WORK.GLOW.
      WHERE priorfrac not = 'NA';
NOTE: PROCEDURE LOGISTIC used (Total process time):
      real time           0.87 seconds
      cpu time            0.40 seconds


57   proc logistic data=glow descending;
58   class age_bmicat (param=ref ref='1');
59   model fracture = age_bmicat/clodds=wald;

NOTE: PROC LOGISTIC is modeling the probability that fracture='1'.
NOTE: Convergence criterion (GCONV=1E-8) satisfied.
NOTE: There were 500 observations read from the data set WORK.GLOW.
NOTE: PROCEDURE LOGISTIC used (Total process time):
      real time           41.10 seconds
      cpu time            1.70 seconds


60   data muscatine_wide(keep=(id o1 o2 o3 a1 a2 a3 gender));
                              -
                              214
                              23
ERROR 214-322: Variable name ( is not valid.

ERROR 23-7: Invalid value for the KEEP option.

61   by id;
ERROR: No SET, MERGE, UPDATE, or MODIFY statement is present.
62   set hw7.muscatine;
63   array y[3] o1 o2 o3;
           -
           124
ERROR 124-185: The variable y has already been defined.

64   array age[3] a1 a2 a3;
           ---
           124
ERROR 124-185: The variable age has already been defined.

65   y[occasion] = obese;
ERROR: Undeclared array referenced: Y.
ERROR: Variable Y has not been declared as an array.
66   retain o1 -o2;
67   a[occasion] = age;
ERROR: Undeclared array referenced: a.
ERROR: Variable a has not been declared as an array.
68   if first.id = 1 then delete;
69   run;

NOTE: The SAS System stopped processing this step because of errors.
NOTE: DATA statement used (Total process time):
      real time           0.01 seconds
      cpu time            0.00 seconds
9 REPLIES 9
Reeza
Super User

The log tells you the first things that are wrong. Fix them and repost the log.

 

60   data muscatine_wide(keep=(id o1 o2 o3 a1 a2 a3 gender));
                              -
                              214
                              23
ERROR 214-322: Variable name ( is not valid.

ERROR 23-7: Invalid value for the KEEP option.

61   by id;
ERROR: No SET, MERGE, UPDATE, or MODIFY statement is present.
62   set hw7.muscatine;
  • The KEEP variable list should not be in parenthesis within the KEEP data set option. Remove the extra parenthesis
  • There is no SET statement so there is no input data set

 

Without the raw data it's hard to see if the code is correct but you're on the right path with some missing components. 

 

Spoiler
Missing retain for the age variables

Use FIRST. and CALL MISSING to reset variables at the start of each ID group

Use LAST. to output only the last record for each ID group, keeping the wide version of variables


Transposing data tutorials:
Long to Wide:
https://stats.idre.ucla.edu/sas/modules/how-to-reshape-data-long-to-wide-using-proc-transpose/

https://stats.idre.ucla.edu/sas/modules/reshaping-data-long-to-wide-using-the-data-step/

PaigeMiller
Diamond | Level 26

There are few benefits and lots of drawbacks to converting a data set from long to wide. Without a good reason, I would avoid doing this.

 

Especially if you want a report, then do as @Kurt_Bremser says and use PROC REPORT, and then you don't have to convert the data from long to wide (and by avoiding doing this work, you avoid one of the drawbacks of converting long to wide). Example: https://communities.sas.com/t5/SAS-Programming/Proc-transpose-with-multiple-var/m-p/846310#M334585 Example 2: https://communities.sas.com/t5/ODS-and-Base-Reporting/format-all-column-names-that-are-similar/m-p/6...

--
Paige Miller
MisterJenn
Fluorite | Level 6

I have to do this for a class assignment. 

Kurt_Bremser
Super User

So the whole issue is transposing the muscatine dataset. Please supply example data for this dataset in a data step with datalines, and the expected output.

PaigeMiller
Diamond | Level 26

@MisterJenn wrote:

I have to do this for a class assignment. 


So, what does the assignment ask you to do with this wide data set, once you create it??

 

Your teacher is teaching you an inefficient way to do things. Please remember that long data sets are easier to work than wide data sets when the class ends and you have to do this for an actual job.

--
Paige Miller
MisterJenn
Fluorite | Level 6
We are using this data set to conduct a logistic regression analysis
PaigeMiller
Diamond | Level 26

@MisterJenn wrote:
We are using this data set to conduct a logistic regression analysis

Okay, you still haven't shown us the data set, but this COULD be (but still might not be) one of the few good reasons to convert long into wide.

--
Paige Miller
tarheel13
Rhodochrosite | Level 12

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 9 replies
  • 1332 views
  • 1 like
  • 5 in conversation