Solved: Re: Rearrange Variables in the same Column

luvscandy27 · Posted 10-17-2023 08:28 AM

Hi All,

I am in need of your assistance please. I have data like this:

Chararacteristic
White only:
Below poverty
At or above poverty
Black or African American only:
Below poverty
At or above poverty
Hispanic or Latino:
Below poverty
At or above poverty

But I am trying to get my data like below. Any suggestions would be greatly appreciated:

White only:	Below Poverty
White only:	At or above poverty
Black or African American only:	Below Poverty
Black or African American only:	At or above poverty
Hispanic or Latino:	Below Poverty
Hispanic or Latino:	At or above poverty

AhmedAl_Attar · Posted 10-17-2023 10:45 AM

Hi @luvscandy27

Stealing the part of the sample code from @A_Kh, here is one way how you can get what you looking for

/* Create sample data */
data have;
infile cards dlm=',' truncover;
length Chararacteristic $100;
input Chararacteristic;
cards; 
White only:,	 
Below poverty,	 
At or above poverty,	 
Black or African American only:,	 
    Below poverty,	 
    At or above poverty,	 
Hispanic or Latino:, 	 
   Below poverty,	 
  At or above poverty
;
run;

/* Create a Format that covers all sub-categories of interest regardless of their main characteristics */
proc formats lib=work;
	value $subs
	'Below poverty' = '*' 
	'At or above poverty' = '*'
	;
run;

data want (DROP=Chararacteristic);
	length main_char $40 sub_char $20;
	retain main_char;
	set have;
	if (PUT(Chararacteristic,$subs.)='*') then sub_char = Chararacteristic;
	else main_char = Chararacteristic;
	if sub_char ne ''; /* Remove header row (Main_char only) */
run;

Hope this helps

View solution in original post

A_Kh · Posted 10-17-2023 09:15 AM

One way to do so is creating two temporary variables and mapping values from the source variable there using RETAIN;

eg;

data have;
infile cards dlm=',' truncover;
length Chararacteristic $100;
input Chararacteristic;
cards; 
White only:,	 
Below poverty,	 
At or above poverty,	 
Black or African American only:,	 
    Below poverty,	 
    At or above poverty,	 
Hispanic or Latino:, 	 
   Below poverty,	 
  At or above poverty
;
proc print; run; 

data want;
	set have;
	length temp1 temp2 $100;
	retain temp1;
	if find(Chararacteristic, ':') gt 0 then temp1= Chararacteristic;
	else temp2=Chararacteristic;
	if temp2 ne '';
	drop Chararacteristic;
proc print;run;

PaigeMiller · Posted 10-17-2023 09:31 AM

I'm sure the solution from @A_Kh works, but question for @luvscandy27: are you 100% sure that there will always be a colon at the end of the text that you want to stay in column 1 and never a colon in the text you want to move to column 2?

--
Paige Miller

luvscandy27 · Posted 10-17-2023 10:03 AM

There will not always be a colon at the end. This is just a portion of the dataset. The dataset has several situations

like the one I posted. I was hoping if i got a solution, I would be able to modify the code and apply the code to similar situations

in the dataset.

PaigeMiller · Posted 10-17-2023 10:05 AM

Please explain the structure of the data to us, so we can help provide a solution. What is the proper indication in the text that indicates text that is to stay in column 1, or that indicates text that should be moved to column 2? Please provide sample data that is more realistic to your actual problem.

--
Paige Miller

luvscandy27 · Posted 10-17-2023 10:12 AM

Below please see the a sample of the table:

Source in PDF: https://www.cdc.gov/nchs/data/hus/2020-2021/VaxCh.pdf

Source in Excel: https://ftp.cdc.gov/pub/Health_Statistics/NCHS/Publications/Health_US/hus20-21tables/VaxCh.xlsx

There is not an identification variable which complicates thing even more.

luvscandy27 · Posted 10-17-2023 10:17 AM

The complete table is segmented by types of vaccinations:

Combined 7-vaccine series

DT\PD\TDTap (4 doses or more)

Hepatitis A (2 doses or more)

Hepatitis B (3 doses or more)

Hib (full series)

Measles, mumps, rubella (1 dose or more)

PCV (4 doses or more)

Polio (3 doses or more)

Rotavirus vaccine

Varicella (1 dose or more)

PaigeMiller · Posted 10-17-2023 10:19 AM

@luvscandy27 wrote:

The complete table is segmented by types of vaccinations:

Combined 7-vaccine series

DT\PD\TDTap (4 doses or more)

Hepatitis A (2 doses or more)

Hepatitis B (3 doses or more)

Hib (full series)

Measles, mumps, rubella (1 dose or more)

PCV (4 doses or more)

Polio (3 doses or more)

Rotavirus vaccine

Varicella (1 dose or more)

I don't see how this relates to the problem you asked about. Please explain this in more detail.

--
Paige Miller

PaigeMiller · Posted 10-17-2023 10:18 AM

Perhaps you can come up with some rules, in words, that identify what text remains in column 1 and what text should move to column 2 (and is there a column 3 here?)

Also, I think we would need to see how this full table gets re-arranged, including the birth year columns.

The more you can tell us about what you want, the better chances we will have to create such code.

--
Paige Miller

Reeza · Posted 10-17-2023 12:21 PM

FYI - your name is in screenshot - may want to remove in future.

luvscandy27 · Posted 10-17-2023 12:25 PM

Thank you. Do you know of anyway I can remove?

Reeza · Posted 10-17-2023 12:31 PM

I removed it for you - just take slightly different screenshots in the future. Honestly, for this, I would just clean up the file manually and the excel is messy so you'd probably need to reference the PDF which is formatted correctly anyways 😞

PS. please double check I have the links correctly and the right file attached. If not, you can click the side of the post and edit it as needed.

luvscandy27 · Posted 10-17-2023 01:50 PM

Thank you!

Tom · Posted 10-17-2023 10:19 AM

That looks like a report. Do you have it as a text file you are reading in?

Do you really have it as a SAS dataset?

Or do you have some other dataset that you used to produce that report? If so what does that dataset look like. What code did you use to produce that report?

AhmedAl_Attar · Posted 10-17-2023 10:45 AM

Hi @luvscandy27

Stealing the part of the sample code from @A_Kh, here is one way how you can get what you looking for

/* Create sample data */
data have;
infile cards dlm=',' truncover;
length Chararacteristic $100;
input Chararacteristic;
cards; 
White only:,	 
Below poverty,	 
At or above poverty,	 
Black or African American only:,	 
    Below poverty,	 
    At or above poverty,	 
Hispanic or Latino:, 	 
   Below poverty,	 
  At or above poverty
;
run;

/* Create a Format that covers all sub-categories of interest regardless of their main characteristics */
proc formats lib=work;
	value $subs
	'Below poverty' = '*' 
	'At or above poverty' = '*'
	;
run;

data want (DROP=Chararacteristic);
	length main_char $40 sub_char $20;
	retain main_char;
	set have;
	if (PUT(Chararacteristic,$subs.)='*') then sub_char = Chararacteristic;
	else main_char = Chararacteristic;
	if sub_char ne ''; /* Remove header row (Main_char only) */
run;

Hope this helps

Register Today!

SAS Training: Just a Click Away