BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
luvscandy27
Quartz | Level 8

Hi All, 

 

I am in need of your assistance please. I have data like this: 

Chararacteristic  
White only:  
    Below poverty  
    At or above poverty  
Black or African American only:  
    Below poverty  
    At or above poverty  
Hispanic or Latino:   
   Below poverty  
  At or above poverty  

 

But I am trying to get my data like below. Any suggestions would be greatly appreciated:

White only: Below Poverty
White only: At or above poverty
Black or African American only: Below Poverty
Black or African American only: At or above poverty
Hispanic or Latino:  Below Poverty
Hispanic or Latino:  At or above poverty
1 ACCEPTED SOLUTION

Accepted Solutions
AhmedAl_Attar
Ammonite | Level 13

Hi @luvscandy27 

Stealing the part of the sample code from @A_Kh, here is one way how you can get what you looking for

/* Create sample data */
data have;
infile cards dlm=',' truncover;
length Chararacteristic $100;
input Chararacteristic;
cards; 
White only:,	 
Below poverty,	 
At or above poverty,	 
Black or African American only:,	 
    Below poverty,	 
    At or above poverty,	 
Hispanic or Latino:, 	 
   Below poverty,	 
  At or above poverty
;
run;

/* Create a Format that covers all sub-categories of interest regardless of their main characteristics */
proc formats lib=work;
	value $subs
	'Below poverty' = '*' 
	'At or above poverty' = '*'
	;
run;

data want (DROP=Chararacteristic);
	length main_char $40 sub_char $20;
	retain main_char;
	set have;
	if (PUT(Chararacteristic,$subs.)='*') then sub_char = Chararacteristic;
	else main_char = Chararacteristic;
	if sub_char ne ''; /* Remove header row (Main_char only) */
run;

Hope this helps 

View solution in original post

20 REPLIES 20
A_Kh
Barite | Level 11

One way to do so is creating two temporary variables and mapping values from the source variable there using RETAIN; 

eg; 

data have;
infile cards dlm=',' truncover;
length Chararacteristic $100;
input Chararacteristic;
cards; 
White only:,	 
Below poverty,	 
At or above poverty,	 
Black or African American only:,	 
    Below poverty,	 
    At or above poverty,	 
Hispanic or Latino:, 	 
   Below poverty,	 
  At or above poverty
;
proc print; run; 

data want;
	set have;
	length temp1 temp2 $100;
	retain temp1;
	if find(Chararacteristic, ':') gt 0 then temp1= Chararacteristic;
	else temp2=Chararacteristic;
	if temp2 ne '';
	drop Chararacteristic;
proc print;run; 
PaigeMiller
Diamond | Level 26

I'm sure the solution from @A_Kh works, but question for @luvscandy27: are you 100% sure that there will always be a colon at the end of the text that you want to stay in column 1 and never a colon in the text you want to move to column 2? 

--
Paige Miller
luvscandy27
Quartz | Level 8

There will not always be a colon at the end. This is just a portion of the dataset. The dataset has several situations 

like the one I posted. I was hoping if i got a solution, I would be able to modify the code and apply the code to similar situations 

in the dataset. 

PaigeMiller
Diamond | Level 26

Please explain the structure of the data to us, so we can help provide a solution. What is the proper indication in the text that indicates text that is to stay in column 1, or that indicates text that should be moved to column 2? Please provide sample data that is more realistic to your actual problem.

--
Paige Miller
luvscandy27
Quartz | Level 8

Below please see the a sample of the table: 

Reeza_0-1697560229822.png

 

 

 

Source in PDF: https://www.cdc.gov/nchs/data/hus/2020-2021/VaxCh.pdf

Source in Excel: https://ftp.cdc.gov/pub/Health_Statistics/NCHS/Publications/Health_US/hus20-21tables/VaxCh.xlsx

There is not an identification variable which complicates thing even more. 

luvscandy27
Quartz | Level 8

The complete table is segmented by types of vaccinations:

Combined 7-vaccine series

DT\PD\TDTap (4 doses or more)

Hepatitis A (2 doses or more)

Hepatitis B (3 doses or more)

Hib (full series)

Measles, mumps, rubella (1 dose or more)

PCV (4 doses or more)

Polio (3 doses or more)

Rotavirus vaccine

Varicella (1 dose or more)

PaigeMiller
Diamond | Level 26

@luvscandy27 wrote:

The complete table is segmented by types of vaccinations:

Combined 7-vaccine series

DT\PD\TDTap (4 doses or more)

Hepatitis A (2 doses or more)

Hepatitis B (3 doses or more)

Hib (full series)

Measles, mumps, rubella (1 dose or more)

PCV (4 doses or more)

Polio (3 doses or more)

Rotavirus vaccine

Varicella (1 dose or more)


I don't see how this relates to the problem you asked about. Please explain this in more detail.

 

 

--
Paige Miller
PaigeMiller
Diamond | Level 26

Perhaps you can come up with some rules, in words, that identify what text remains in column 1 and what text should move to column 2 (and is there a column 3 here?)

 

Also, I think we would need to see how this full table gets re-arranged, including the birth year columns.

 

The more you can tell us about what you want, the better chances we will have to create such code. 

--
Paige Miller
Reeza
Super User
FYI - your name is in screenshot - may want to remove in future.
luvscandy27
Quartz | Level 8

Thank you. Do you know of anyway I can remove?

 

Reeza
Super User

I removed it for you - just take slightly different screenshots in the future. Honestly, for this, I would just clean up the file manually and the excel is messy so you'd probably need to reference the PDF which is formatted correctly anyways 😞

 

PS. please double check I have the links correctly and the right file attached. If not, you can click the side of the post and edit it as needed.

luvscandy27
Quartz | Level 8

Thank you!

Tom
Super User Tom
Super User

That looks like a report.  Do you have it as a text file you are reading in?

Do you really have it as a SAS dataset?

 

Or do you have some other dataset that you used to produce that report?  If so what does that dataset look like.  What code did you use to produce that report?

AhmedAl_Attar
Ammonite | Level 13

Hi @luvscandy27 

Stealing the part of the sample code from @A_Kh, here is one way how you can get what you looking for

/* Create sample data */
data have;
infile cards dlm=',' truncover;
length Chararacteristic $100;
input Chararacteristic;
cards; 
White only:,	 
Below poverty,	 
At or above poverty,	 
Black or African American only:,	 
    Below poverty,	 
    At or above poverty,	 
Hispanic or Latino:, 	 
   Below poverty,	 
  At or above poverty
;
run;

/* Create a Format that covers all sub-categories of interest regardless of their main characteristics */
proc formats lib=work;
	value $subs
	'Below poverty' = '*' 
	'At or above poverty' = '*'
	;
run;

data want (DROP=Chararacteristic);
	length main_char $40 sub_char $20;
	retain main_char;
	set have;
	if (PUT(Chararacteristic,$subs.)='*') then sub_char = Chararacteristic;
	else main_char = Chararacteristic;
	if sub_char ne ''; /* Remove header row (Main_char only) */
run;

Hope this helps 

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 20 replies
  • 4921 views
  • 3 likes
  • 6 in conversation