Please, I need help converting data from wide to long but some of the variables in the array statement may have missing values as indicated by the error message.
How do, I write a code that converts this data from long to wide and excludes variables with missing values. The following is my code.
DATA _ds2cd4d; SET _ds2cd4c2 ; Pthirty=p30; BY Pthirty; KEEP ID avisitn2 pchg avgb Pthirty1-Pthirty17 lbdt1-lbdt17; RETAIN Pthirty1-Pthirty17 lbdt1-lbdt17; ARRAY aPthirty(1:17) Pthirty0-Pthirty17; ARRAY albdt(1:17) lbdt0-lbdt17; IF first.ID THEN DO; DO i = 1 to 17 ; aPthirty( i ) = 0 ; albdt( i ) = 0 ; END; END; aPthirty( avisitn2 ) = Pthirty ; albdt( avisitn2 ) = lbdt ; IF last.ID THEN OUTPUT ; RUN;
It appears you are making data WIDE not long.
Your error messages are because you define the ARRAY with 17 elements and then provide 18 variables
ARRAY aPthirty(1:17) Pthirty0-Pthirty17;
The 1:17 says the index values to use will be 1 to 17, meaning a max of 17 variables. Pthirty0 - Pthirty17 would be 18 variables so the variable names provided do not match the Array size definition.
How about provide example actual data that you have. From the confusing stuff I see in your code I am not sure exactly what you expect.
Wide to long would look more like:
data example; set have; array v (*) <list of variables that you already have goes here>; do i=1 to dim(v); if not missing (v[i]) then value = v[i]; output; end; drop <your list of variables> i ; run;
Which will provide all of the variables other than those on the array for each observation in the output.
IF you have two sets of variables and they are paired in some form you may use two arrays and output them together BUT your check for not missing values likely means something quite different because of the cases where one of the pair might be missing but the other is not. So would need more rules as to what the output would look like in that case.
You appear to want to convert from LONG to WIDE.
There are two main issues in the code.
One is that you seem confused about how many pairs of variables you need. Do you want 17 as indicated by your manual setting of the index range from 1 to 17. Or do you want 18 as indicated by using the 18 variables with numeric suffixes from 0 to 17?
The second is that If some of the observations do not exist in the LONG version then you need some type of INDEX variable that says which element in the array the current observation represents. What does the index into the array represent ? Is it a visit number? A relative number of days or other time interval?
Note if any of the arrays are character then you should also set the storage length for the new variables you are creating with the ARRAY statements.
Let's assume your existing dataset has a variable named AVISITN2 that indicates which target variable should get the value(s). Let's also assume your index values start from zero instead of one so we can demonstrate the use of the option index range in the ARRAY statement.
So to convert from TALL to WIDE your code could look like this:
data wide; set tall ; by ID ; array aPthirty [0:17] Pthirty0-Pthirty17; array albdt [0:17] lbdt0-lbdt17; IF first.ID THEN DO index=0 to 17; aPthirty[index] = 0 ; albdt[index] = 0 ; end; if avisitn2 in [0:17] then do; aPthirty[avisitn2] = Pthirty ; albdt[avisitn2] = lbdt ; end; else put 'WARNING: Invalid AVISITN2 in ' (_n_ id avisitn2 pthirty ibdt) (=); if last.id; keep ID avisitn2 pchg avgb Pthirty1-Pthirty17 lbdt1-lbdt17; retain Pthirty0-Pthirty17 lbdt0-lbdt17; run;
Note you don't mention show PCHG and AVGB variables anywhere other than the KEEP statement. So we have to assume they have the same value for all observations for the same ID value. What they will get in the output is the value on the last observation for the ID group.
To go the other way from WIDE to TALL you just need to do something like:
data tall ; set tall ; array aPthirty [0:17] Pthirty0-Pthirty17; array albdt [0:17] lbdt0-lbdt17; do avisitn2=0 to 17; if 0 < n(of aPthirty[avisitn2] albdt[avisitn2]) then do; Pthirty = aPthirty[avisitn2]; lbdt = albdt[avisitn2]; end; end; drop Pthirty0-Pthirty17 lbdt0-lbdt17; run;
Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.
If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website.
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.