BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Paul_NYS
Obsidian | Level 7

Hi Everyone

I have a data set where I would like to make one column into a row. I understand that transpose moves a row into a column, but I am having trouble wrapping my head around how to transpose a column into a record.

Below is a link to a data set that has the following columns:

cnty_name

startyear

exit

agecat4

exitMonth

cohortyeartotal

cohortyeartotalage

distributionnumber

distributionpercent

cumulativenumber

cumulativepercent

outofhomecare

http://www.nycourts.gov/surveys/cwcip/transpose-issue.zip

What I would like to do is to make the outofhomecare column into a record. The current value that is in the outofhomecare column would be the cumulativenumber value and the value for the exit column would be "outofhomecare". The values for most of the other columns would be as they are.or blank (for calculations).

Does transpose correctly work for mapping a column to a record?

Paul

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

You seem to be using COLUMN and ROW in different ways than I expect. Perhaps if we switch to VARIABLES and OBSERVATIONS instead?

The variable (_NAME_) that you see with the text string 'OutofHomeCare' on every observations is just put the automatic variable that PROC TRANSPOSE creates to indicate where the values for that observation (row) came from.   If we had used two variable names in the VAR statement then the two variables (columns) would have been transposed into two observations (rows) in the output dataset. Then the value of the _NAME_ variable could help you see which variable (column) was use to create that observation (row) in the output dataset.  The ID statement was what caused it to use the value of the EXIT variable to name the variables it created to store the multiple observations that it saw for each BY group.  Without the ID statement those variables would instead just be named COL1, COL2, ...., COLn, where n is the maximum number of observations (rows) over all of the BY groups.

You have a variable in your dataset called OutofHomeCare that appears to be numeric. It appears to have a different value for each observation in the dataset.  I do not know what you mean by converting it to a ROW if you did not want to rotate the values up into multiple variables. 

Perhaps if you posted an example with just five or ten observations showing what you want to change?

It might also help if you could explain how you want to use the newly formatted dataset.  Perhaps there is a better structure for your intended use.

View solution in original post

12 REPLIES 12
art297
Opal | Level 21

Paul, I don't understand.  Do you want one row with 2721 values for outofhomecare?

Paul_NYS
Obsidian | Level 7

No, what I need to do is to create one outofhomecare record for each unique instance of cnty_name, startyear, agecat4 exitmonthcategory. To make it easier to see, I changed the below example file to include only one county's worth of data. So there are 2040 records in the amended file (for only one county). There are 6 different values of startyear, 5 different values of agecat4 and 17 different value of exitmonthcategory. This would be 510 total records of outofhomecare for this one county. 'Exit' would equal the actual word 'outofhomecare', 'cnty_name' would be county-ABC, 'startyear' would be each of the start years (2006-2011), agecat4 would be each of the categories (0, 1 to 5, 6 to 12, etc.) and cumulativenumber would be the current value under the outofhomecare column. The other categories are blank for now.

That is what I am trying to do.

http://www.nycourts.gov/surveys/cwcip/transpose-issue.zip

Paul

art297
Opal | Level 21

Still not sure what you want.  Does the following come close?

proc transpose data=have (keep=cnty_name

                          startyear

                          agecat4

                          exitmonthcategory

                          OutOfHomeCare)

  out=want;

  by cnty_name startyear agecat4 exitmonthcategory;

run;

Paul_NYS
Obsidian | Level 7

Hi Art

This behaves the same way as Tom's transpose--it creates new columns in addition to moving the outofhomecare column to records. I need to take a closer look to determine if the new columns created contain the correct values for the new outofhomecare records.

Paul

Manu_Jain
Calcite | Level 5

Hi

Do tell me if I am wrong but as far as I get to know what you want is making outofhomecare to be in records not in columns.

I am not writing the sas code right now because 1st I want to be sure what u are actually asking for.transpose-issue.png

I have done something in excel to show you ,what i really get from your problem. Image shows the how for each cnty_name, startyear, agecat4 exitmonthcategory, the outofhomecare is divided.Is this you want sas?

Paul_NYS
Obsidian | Level 7

Hi Manu, can you post that spreadsheet so I can take a closer look at it?

Paul

Manu_Jain
Calcite | Level 5

Hi Paul
here it is...do let me know if i get you correctly.

Tom
Super User Tom
Super User

If looks like your data is grouped by

cnty_name

startyear

agecat4

exitMonth

cohortyeartotal

cohortyeartotalage

Do you want to use the value of EXIT as the name of the variable that contains the value from OutOfHomeCare?

proc transpose data=have out=want ;

  by

cnty_name

startyear

agecat4

exitMonth

cohortyeartotal

cohortyeartotalage

;

id exit;

var OutOfHomeCare;

run;

What do you want to do with the columns

distributionnumber

distributionpercent

cumulativenumber

cumulativepercent

That also seem to have different values for each EXIT value?

Paul_NYS
Obsidian | Level 7

Thanks Tom. I ran the above transform. It definitely put the outofhomecare column into rows under the _NAME_ column. However it also moved all the rows that were previously under Exit into columns, with each value being a different column. Is there a way to keep the existing values under the Exit column?

art297
Opal | Level 21
Tom
Super User Tom
Super User

You seem to be using COLUMN and ROW in different ways than I expect. Perhaps if we switch to VARIABLES and OBSERVATIONS instead?

The variable (_NAME_) that you see with the text string 'OutofHomeCare' on every observations is just put the automatic variable that PROC TRANSPOSE creates to indicate where the values for that observation (row) came from.   If we had used two variable names in the VAR statement then the two variables (columns) would have been transposed into two observations (rows) in the output dataset. Then the value of the _NAME_ variable could help you see which variable (column) was use to create that observation (row) in the output dataset.  The ID statement was what caused it to use the value of the EXIT variable to name the variables it created to store the multiple observations that it saw for each BY group.  Without the ID statement those variables would instead just be named COL1, COL2, ...., COLn, where n is the maximum number of observations (rows) over all of the BY groups.

You have a variable in your dataset called OutofHomeCare that appears to be numeric. It appears to have a different value for each observation in the dataset.  I do not know what you mean by converting it to a ROW if you did not want to rotate the values up into multiple variables. 

Perhaps if you posted an example with just five or ten observations showing what you want to change?

It might also help if you could explain how you want to use the newly formatted dataset.  Perhaps there is a better structure for your intended use.

Paul_NYS
Obsidian | Level 7

The below transpose may work with some modifications, thank you. While it did not come out the way I was envisioning, I'll report back on the result.

Paul

proc transpose data=have out=want ;

  by

cnty_name

startyear

agecat4

exitMonth

cohortyeartotal

cohortyeartotalage

;

id exit;

var OutOfHomeCare;

run;

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 12 replies
  • 2662 views
  • 3 likes
  • 4 in conversation