Please, I need help converting character variables from wide to long. The MED1-MED8 variables are names of medication. I tried the following code but it it did not work. I only get the number of the medication but no MED variable. Please. kindly help.
Below are my SAS code and the output.
Thank you
DATA long_subset;
SET subset;
ARRAY aMED(1:8) MED1 - MED8 ;
DO number = 1 to 8 ;
MED = aMED(number);
OUTPUT;
END;
DROP MED1 - MED8 ;
RUN;
MED1-MED8 already have the format applied, that is shown in your PROC CONTENTS output.
You want to apply the format to variable MED
Show the LOG from running that code. Copy the text of the entire data step from the log along with all the notes and messages, open a text box on the forum using the </> icon above the message window and paste the text.
713686 DATA long;
713687 SET nine_years;
713688 ARRAY aMED(1:8) MED1 - MED8 ;
713689 DO number = 1 to 8 ;
713690 MED = aMED(number);
713691 OUTPUT;
713692 END;
713693 DROP MED1 - MED8 ;
713694 RUN;
NOTE: Character values have been converted to numeric values at the places given by:
(Line):(Column).
713690:11
NOTE: There were 206167 observations read from the data set WORK.NINE_YEARS.
NOTE: The data set WORK.LONG has 1649336 observations and 1157 variables.
NOTE: DATA statement used (Total process time):
real time 21.54 seconds
cpu time 2.79 seconds
DATA long;
SET nine_years;
ARRAY aMED(1:8) MED1 - MED8 ;
DO number = 1 to 8 ;
MED = aMED(number);
OUTPUT;
END;
DROP MED1 - MED8 ;
RUN;
First, you obfuscated a bit in the original post about the output. You said and showed that NUMBER was the only output. The text of the output says 1157 variables, so you we short by about 1156 variables.
I would bet a short stack of $$$ that your existing data set Nine_years already has a variable named just MED and that it is a numeric variable. That would be the source of the message about "character values have been converted to numeric".
Try (after making sure that you do not have an existing variable with the name Medname. If you do use something before the = sign that does not exist in the source data set)
DATA long;
SET nine_years;
ARRAY aMED(1:8) MED1 - MED8 ;
DO number = 1 to 8 ;
MEDname = aMED(number);
OUTPUT;
END;
DROP MED1 - MED8 ;
RUN;
Note: when you use the names of the variables you do not have to provide the index for the array unless you are doing something odd with an offset:
Array amed(*) med1-med8;
will suffice.
Also you can use the DIM function, which returns the number of elements in an array, for the limit of the loop counter:
Do Number=1 to dim(amed);
These two minor statements make maintaining code a bit simpler. If you need to run it with a different set that had 25 Med variables the only change you would make would be to replace Med8 with Med25 on the array statement.
Or if you wanted to add other variables to processing just adding them to the array definition would work.
I created a subset data with year and Med1-Med8. I repeated the analysis but got a Med variable with only numbers.
I also simplified the code as you suggested. Please find below the SAS code and log output
Data Subset;
set nine_years;
keep year MED1 - MED8;
run;
DATA long_subset;
length MED $ 100;
SET subset;
ARRAY aMED(*) MED1 - MED8 ;
DO Number=1 to dim(amed);
MED = aMED(number);
OUTPUT;
END;
DROP MED1 - MED8 ;
RUN;
Data Subset;
set nine_years;
keep year MED1 - MED8;
run;
DATA long_subset;
length MED $ 100;
SET subset;
ARRAY aMED(*) MED1 - MED8 ;
DO Number=1 to dim(amed);
MED = aMED(number);
OUTPUT;
END;
DROP MED1 - MED8 ;
RUN;
@UcheOkoro wrote:
I created a subset data with year and Med1-Med8. I repeated the analysis but got a Med variable with only numbers.
I also simplified the code as you suggested. Please find below the SAS code and log output
Data Subset; set nine_years; keep year MED1 - MED8; run; DATA long_subset; length MED $ 100; SET subset; ARRAY aMED(*) MED1 - MED8 ; DO Number=1 to dim(amed); MED = aMED(number); OUTPUT; END; DROP MED1 - MED8 ; RUN;Data Subset; set nine_years; keep year MED1 - MED8; run; DATA long_subset; length MED $ 100; SET subset; ARRAY aMED(*) MED1 - MED8 ; DO Number=1 to dim(amed); MED = aMED(number); OUTPUT; END; DROP MED1 - MED8 ; RUN;
Your variables Med1 through Med8 do not contain "names of medications". It is possible that they contain numeric codes, that when displayed with a custom format display the meaning of the numeric code.
Also bet that -9 is supposed to mean "no name recorded" or similar in the original data.
What FORMAT is assigned to the variables Med1 - Med8? You may have to share Proc Content results so we don't have to guess or ask about what is in you actual data set.
Below is the result of proc contents.
So MED1-MED8 are character strings, so that's what you have in the new variable MED. You probably want to assign the format named $MEDCODF. to MED, that may be the missing link in what you are looking for.
Thank you for your response.
Is this the right way to assign the format?
DATA long_subset;
length MED $ 100;
SET subset;
FORMAT MED1 - MED8 $MEDCODF. ;
ARRAY aMED(*) MED1 - MED8 ;
DO Number=1 to dim(amed);
MED = aMED(number);
OUTPUT;
END;
DROP MED1 - MED8 ;
RUN;
MED1-MED8 already have the format applied, that is shown in your PROC CONTENTS output.
You want to apply the format to variable MED
It worked. Thank you so much!
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.