Solved: Re: Proc Transpose

Paul_NYS · Posted 11-26-2012 05:38 PM

Hi Everyone

I have a data set where I would like to make one column into a row. I understand that transpose moves a row into a column, but I am having trouble wrapping my head around how to transpose a column into a record.

Below is a link to a data set that has the following columns:

cnty_name

startyear

exit

agecat4

exitMonth

cohortyeartotal

cohortyeartotalage

distributionnumber

distributionpercent

cumulativenumber

cumulativepercent

outofhomecare

http://www.nycourts.gov/surveys/cwcip/transpose-issue.zip

What I would like to do is to make the outofhomecare column into a record. The current value that is in the outofhomecare column would be the cumulativenumber value and the value for the exit column would be "outofhomecare". The values for most of the other columns would be as they are.or blank (for calculations).

Does transpose correctly work for mapping a column to a record?

Paul

Tom · Posted 11-27-2012 09:26 PM

You seem to be using COLUMN and ROW in different ways than I expect. Perhaps if we switch to VARIABLES and OBSERVATIONS instead?

The variable (_NAME_) that you see with the text string 'OutofHomeCare' on every observations is just put the automatic variable that PROC TRANSPOSE creates to indicate where the values for that observation (row) came from. If we had used two variable names in the VAR statement then the two variables (columns) would have been transposed into two observations (rows) in the output dataset. Then the value of the _NAME_ variable could help you see which variable (column) was use to create that observation (row) in the output dataset. The ID statement was what caused it to use the value of the EXIT variable to name the variables it created to store the multiple observations that it saw for each BY group. Without the ID statement those variables would instead just be named COL1, COL2, ...., COLn, where n is the maximum number of observations (rows) over all of the BY groups.

You have a variable in your dataset called OutofHomeCare that appears to be numeric. It appears to have a different value for each observation in the dataset. I do not know what you mean by converting it to a ROW if you did not want to rotate the values up into multiple variables.

Perhaps if you posted an example with just five or ten observations showing what you want to change?

It might also help if you could explain how you want to use the newly formatted dataset. Perhaps there is a better structure for your intended use.

View solution in original post

art297 · Posted 11-26-2012 05:45 PM

Paul, I don't understand. Do you want one row with 2721 values for outofhomecare?

Paul_NYS · Posted 11-26-2012 06:30 PM

No, what I need to do is to create one outofhomecare record for each unique instance of cnty_name, startyear, agecat4 exitmonthcategory. To make it easier to see, I changed the below example file to include only one county's worth of data. So there are 2040 records in the amended file (for only one county). There are 6 different values of startyear, 5 different values of agecat4 and 17 different value of exitmonthcategory. This would be 510 total records of outofhomecare for this one county. 'Exit' would equal the actual word 'outofhomecare', 'cnty_name' would be county-ABC, 'startyear' would be each of the start years (2006-2011), agecat4 would be each of the categories (0, 1 to 5, 6 to 12, etc.) and cumulativenumber would be the current value under the outofhomecare column. The other categories are blank for now.

That is what I am trying to do.

http://www.nycourts.gov/surveys/cwcip/transpose-issue.zip

Paul

art297 · Posted 11-26-2012 06:44 PM

Still not sure what you want. Does the following come close?

proc transpose data=have (keep=cnty_name

startyear

agecat4

exitmonthcategory

OutOfHomeCare)

out=want;

by cnty_name startyear agecat4 exitmonthcategory;

run;

Paul_NYS · Posted 11-27-2012 03:18 PM

Hi Art

This behaves the same way as Tom's transpose--it creates new columns in addition to moving the outofhomecare column to records. I need to take a closer look to determine if the new columns created contain the correct values for the new outofhomecare records.

Paul

Manu_Jain · Posted 11-27-2012 03:44 AM

Hi

Do tell me if I am wrong but as far as I get to know what you want is making outofhomecare to be in records not in columns.

I am not writing the sas code right now because 1st I want to be sure what u are actually asking for.

I have done something in excel to show you ,what i really get from your problem. Image shows the how for each cnty_name, startyear, agecat4 exitmonthcategory, the outofhomecare is divided.Is this you want sas?

Paul_NYS · Posted 11-27-2012 03:14 PM

Hi Manu, can you post that spreadsheet so I can take a closer look at it?

Paul

Manu_Jain · Posted 11-28-2012 12:05 AM

Hi Paul
here it is...do let me know if i get you correctly.

Tom · Posted 11-26-2012 10:55 PM

If looks like your data is grouped by

cnty_name

startyear

agecat4

exitMonth

cohortyeartotal

cohortyeartotalage

Do you want to use the value of EXIT as the name of the variable that contains the value from OutOfHomeCare?

proc transpose data=have out=want ;

by

cnty_name

startyear

agecat4

exitMonth

cohortyeartotal

cohortyeartotalage

;

id exit;

var OutOfHomeCare;

run;

What do you want to do with the columns

distributionnumber

distributionpercent

cumulativenumber

cumulativepercent

That also seem to have different values for each EXIT value?

Paul_NYS · Posted 11-27-2012 03:13 PM

Thanks Tom. I ran the above transform. It definitely put the outofhomecare column into rows under the _NAME_ column. However it also moved all the rows that were previously under Exit into columns, with each value being a different column. Is there a way to keep the existing values under the Exit column?

art297 · Posted 11-27-2012 03:17 PM

Paul: You could always use the copy statement (see: http://support.sas.com/documentation/cdl/en/proc/61895/HTML/default/viewer.htm#a000063666.htm )

Tom · Posted 11-27-2012 09:26 PM

You seem to be using COLUMN and ROW in different ways than I expect. Perhaps if we switch to VARIABLES and OBSERVATIONS instead?

The variable (_NAME_) that you see with the text string 'OutofHomeCare' on every observations is just put the automatic variable that PROC TRANSPOSE creates to indicate where the values for that observation (row) came from. If we had used two variable names in the VAR statement then the two variables (columns) would have been transposed into two observations (rows) in the output dataset. Then the value of the _NAME_ variable could help you see which variable (column) was use to create that observation (row) in the output dataset. The ID statement was what caused it to use the value of the EXIT variable to name the variables it created to store the multiple observations that it saw for each BY group. Without the ID statement those variables would instead just be named COL1, COL2, ...., COLn, where n is the maximum number of observations (rows) over all of the BY groups.

You have a variable in your dataset called OutofHomeCare that appears to be numeric. It appears to have a different value for each observation in the dataset. I do not know what you mean by converting it to a ROW if you did not want to rotate the values up into multiple variables.

Perhaps if you posted an example with just five or ten observations showing what you want to change?

It might also help if you could explain how you want to use the newly formatted dataset. Perhaps there is a better structure for your intended use.

Paul_NYS · Posted 11-29-2012 12:50 PM

The below transpose may work with some modifications, thank you. While it did not come out the way I was envisioning, I'll report back on the result.

Paul

proc transpose data=have out=want ;

by

cnty_name

startyear

agecat4

exitMonth

cohortyeartotal

cohortyeartotalage

;

id exit;

var OutOfHomeCare;

run;

Registration is open

SAS Training: Just a Click Away