Hi,
I transposed a data and i got a numeric column data that looks like this below:
For my next data step, I tried to keep them in a data step but it didn't work. Here is my idea
data test2 (keep=_2023 _2024);
set test1;
run;
Apparently SAS did not recognize them as _2023. I want to rename them as Y2023 or rename them as _2023 so I can use them for % calculation but again SAS is not recognizing them as a variable i think because they are just like a number.
Sorry I don't have a sample code because the one i created also did not work except I put Y infront of it. e.g., Y2023.
Please any help will be appreciated.
Assuming you used PROC TRANSPOSE, you can add the PREFIX option to have it added to the variable name.
Ensure that validvarname is also set to V7 to help avoid name literals.
options validvarname=v7;
proc transpose data=have out=want prefix=Y;
where ...;
var count;
id year;
run;
@CathyVI wrote:
@PaigeMiller @maguiremq Thanks for your response. This is the table before I transpose it
After I transposed I got this: @PaigeMiller How do I calculate each year which is my next step if i dont transpose?
So I want to have the count of each year 2023, 2024, find the total so I can calculate percentage using this next data set:
data test4 (drop= _NAME_ T_2023 T_2024); set test3; percent23 = (_2023/T_2021)*100; percent24 = (_2024/T_2022)*100; run;
@maguiremq I don't think I can follow your step since I did not just create the data. There are multiple steps before I got here. Except I did not understand your logic. Please help out if i am wrong.
Actually, I think the mistake was using TRANSPOSE in the first place. Maxim 19
Why do you need to do this transpose? What will the next analysis or programming done on this data? Normally transposing just makes the next steps harder.
All of it can be done in PROC TRANSPOSE.
You can also do it in a DATA step.
But best practice would be to get familiar with PROC DATASETS as it doesn't have to read in the entire file.
For this case, it is probably ok to not use PROC DATASETS, but if your files start to get much larger, then PROC DATASETS will be much faster. It doesn't have to read in the entire data set from your SET statement. It will modify the variable names on the data set directly.
data have;
input _name_ $ "2023"n "2024"n;
datalines;
count 1 2
;
run;
proc datasets library = work memtype = data nolist;
modify have;
rename "2023"n = _2023
"2024"n = _2024
;
run;
quit;
data want;
set have (rename = ("2023"n = _2023 "2024"n = _2024));
run;
@PaigeMiller @maguiremq Thanks for your response. This is the table before I transpose it
After I transposed I got this: @PaigeMiller How do I calculate each year which is my next step if i dont transpose?
So I want to have the count of each year 2023, 2024, find the total so I can calculate percentage using this next data set:
data test4 (drop= _NAME_ T_2023 T_2024);
set test3;
percent23 = (_2023/T_2021)*100;
percent24 = (_2024/T_2022)*100;
run;
@maguiremq I don't think I can follow your step since I did not just create the data. There are multiple steps before I got here. Except I did not understand your logic. Please help out if i am wrong.
This looks like PROC MEANS or PROC SUMMARY output. Is it? If yes, can you show us the code? If yes, can you show us the columns to the left of YEAR? If yes, can you show us a portion of the data that was used by PROC MEANS or PROC SUMMARY?
Asking again as I did in your previous threads: What is the numerator and what is the denominator? Okay, the numerator seems very clear. But we don't know what the denominator is, we don't know what T_2021 or T_2022 is, there is no such thing in this data.
I suspect, but can't say for sure, that there is no need for TRANSPOSE here, and in fact TRANSPOSE is actually a mistake here. Once you provide the information requested, I may be able to say with certainty.
@PaigeMiller This is a portion of the data before the proc summary
Yes it is proc summary and this is the code:
proc summary data=test1;
class sex age race service component year;
var count;
output out=test2 sum= count;
The denominator was calculated using this code after the transpose:
data test3 (keep= _NAME_ _2023 _2024 rename=( _2023 = T_2023 _2024= T_2024));
set test2;
if varname='total';
run;
This code was extracted and the person was able to make their 2023 in to _2023 but I wasn't after following all the steps. Thanks
All the pieces are not fitting together. Now you show
rename=( _2023 = T_2023 _2024= T_2024)
whereas in your previous post you were trying to use different years T_2021 and T_2022. So which years are you wanting to use for the denominator? And where do we find these T_2021 or T_2022 or T_2023 or T_2024 in the data you have shown??? You have not told us.
In addition, the output from that PROC SUMMARY contains counts by sex, age, race, service, component and year; and it contains counts of all two-way combinations of those variables; and it contains counts of all three-way combinations of those variables; and so on, up to all six way combinations of those variables. All of that is in there, most of which doesn't seem relevant to the question you asked originally, as far as I can tell, which was just about YEAR.
So, we are back to not knowing what the numerator should be (which rows of the PROC SUMMARY output data set) and we do not know what the denominator is (which rows of the PROC SUMMARY output data set).
Can you describe in words (and without using SAS code and without showing us screen captures) the entire problem, start to finish: what is the question you want to answer?
By the way, you should be describing the problem in words, start to finish, in every post, regardless. It is now obvious to me that this is not a problem about renaming columns after a transpose, and we have spent a lot of time trying to get to the bottom of it that we would not have spent if we had this problem description in words start to finish. After providing this description, then show us (a portion of) the original data and your proposed code.
@PaigeMiller @Reeza @ballardw @maguiremq
Sorry I copied the data and was changing the year to fit my year 2023 and 2024.
In words: I have a data set which i want a count of the variables e.g. age group, sex, by year '2023, 2024). I have create the counts and the year. I transpose it so I can have each variable by year count and calculate the percentage for each year.
I encountered an issue when the year (I transposed) was not recognized by SAS as a numeric variable. The code I was coping did the same thing. They got _2023, _2024 (numerator) when they transposed and they used the Total to find T_2023 which was the denominator. Then they calculated the percentage with a data step. This is what am hoping to do. Thanks
@CathyVI wrote:
@PaigeMiller @Reeza @ballardw @maguiremq
Sorry I copied the data and was changing the year to fit my year 2023 and 2024.
In words: I have a data set which i want a count of the variables e.g. age group, sex, by year '2023, 2024). I have create the counts and the year. I transpose it so I can have each variable by year count and calculate the percentage for each year.
I encountered an issue when the year (I transposed) was not recognized by SAS as a numeric variable. The code I was coping did the same thing. They got _2023, _2024 (numerator) when they transposed and they used the Total to find T_2023 which was the denominator. Then they calculated the percentage with a data step. This is what am hoping to do. Thanks
Reiterate: provide example data of before you did the "transpose" and provide what you want.
Obviously it is highly unlikely to be "a count of the variables e.g. age group, sex". That would be 2. I suspect you mean count of levels of these variables by year. And transpose is not needed in any way. Plus this still sounds like an intermediate step. Provide a description of the final result, not one step.
And again, actual working data that behaves like your data goes a long way to avoid repeated question/partial answer, question/incomplete answer, question/not relevant answer cycles.
I transpose it so I can have each variable by year count and calculate the percentage for each year.
This is simply not necessary, and makes your programming more difficult. Please let's stop asking about transposing and renaming. But, one more time, you have many rows in the data set output from PROC SUMMARY, which are the numerators, and which are the denominators? Please be specific. In the data you showed,
please give us an example: for one such desired percent, which row contains the numerator and which row contains the denominator?
Until this is cleared up, this is an absolute showstopper. I can not even try to write code for this, and this has nothing to do with SAS, I could not tell you how to do this pencil and paper either.
@CathyVI wrote:
@PaigeMiller @maguiremq Thanks for your response. This is the table before I transpose it
After I transposed I got this: @PaigeMiller How do I calculate each year which is my next step if i dont transpose?
So I want to have the count of each year 2023, 2024, find the total so I can calculate percentage using this next data set:
data test4 (drop= _NAME_ T_2023 T_2024); set test3; percent23 = (_2023/T_2021)*100; percent24 = (_2024/T_2022)*100; run;
@maguiremq I don't think I can follow your step since I did not just create the data. There are multiple steps before I got here. Except I did not understand your logic. Please help out if i am wrong.
To get _type_ with Proc Means/Summary there have to be at least 3 CLASS variables (and would be several more types involved)
Suggestion: Instead of showing intermediate data sets, that you seem not quite sure what you want to do yet, provide examples of the RAW data and describe the desired output. If you can't provide some example data as a data step use on of the SAS supplied data sets like SASHELP.Class (is small enough to do manual calculations for examples) or SASHELP.Cars that has quite a few observations and multiple class and numeric variables.
You data step manipulating this likely needs some things set conditionally based on the _type_. Do you understand what that _type_ variable tells you? _Type_=0 is the summary over the entire data set. So that would have the "count" of everything. _type_=1 is going to be the summary of ONE level of one of the class variables. _Type_=2 is going to be summary of the second Class variable (which your picture shows as removed, which is likely going to be hard to use) . _type_=3 if there are exactly 2 Class variables will have the combinations of the values of the two class variables.
Assuming you used PROC TRANSPOSE, you can add the PREFIX option to have it added to the variable name.
Ensure that validvarname is also set to V7 to help avoid name literals.
options validvarname=v7;
proc transpose data=have out=want prefix=Y;
where ...;
var count;
id year;
run;
@CathyVI wrote:
@PaigeMiller @maguiremq Thanks for your response. This is the table before I transpose it
After I transposed I got this: @PaigeMiller How do I calculate each year which is my next step if i dont transpose?
So I want to have the count of each year 2023, 2024, find the total so I can calculate percentage using this next data set:
data test4 (drop= _NAME_ T_2023 T_2024); set test3; percent23 = (_2023/T_2021)*100; percent24 = (_2024/T_2022)*100; run;
@maguiremq I don't think I can follow your step since I did not just create the data. There are multiple steps before I got here. Except I did not understand your logic. Please help out if i am wrong.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.