BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Wub_SAS
Obsidian | Level 7

Hi,

I need to rename some categories of character variables in my dataset to improve consistency and clean data entry errors. I couldn't figure it out why my below code works for some of the variables but not for the others. Also attached is a sample of the input dataset. Thank you!

data WORK.CONSISTENT;
infile datalines dsd truncover;
input p1q1_nowk_mnths:$8. p1q1_activity_pl:$8. p1q1_nowk_yrs:$8. p1q1_ptwkhxs1:$8. p1q1_ptwkhxe1:$8. p1q1_spouse_activity_pl:$8. p1q1_spouse_activity_upl:$8. p1q1_spouse_nowkhx:$8. p1q1_spouse_nowkhxs1:$8. p1q1_spouse_nowkhxe1:$8. record_id:$6.;
datalines;
0 4 0 16 22 10 NA 12007c
4 90 0 42 44 0 0 16-18 16 18 12024c
0 0 21 24 ASKU 12025c
0 15 0 12032c
0 9 NA NA 4 NA NA NA 12033c
3 30 0 16 21 ASKU ASKU 12043c
9 44 44 0 0 16-18 16 18 12044
0 90 0 12045c
12 NRG 18 26 21 7 16-25 16 25 12048c
NRG 70 1 12049c
9 180 0 15 18 0 200 NA 12314c
0 18 21 12328c
5 NA 0 30 0 12493c
0 0 0 0 0 12507c
6 60 0 16 22 12514c
0 30 0 43 43 0 30 12527c
10 4 0 38 39 22001c
2 1 23 26 18 30-32 30 32 22002c
9 10 0 19 23 NA 22006c
0 40 0 16 22 20 60 0 0 0 22007c
;;;;
Data Consistent2;	
	length p1q1_nowk_mnths p1q1_activity_pl p1q1_nowk_yrs p1q1_ptwkhxs1 p1q1_ptwkhxe1 p1q1_spouse_activity_pl p1q1_spouse_activity_upl
			p1q1_spouse_nowkhx p1q1_spouse_nowkhxs1 p1q1_spouse_nowkhxe1 $8.; 
	Set Consistent (keep=record_id p1q1_nowk_mnths p1q1_activity_pl p1q1_nowk_yrs p1q1_ptwkhxs1 p1q1_ptwkhxe1 p1q1_spouse_activity_pl p1q1_spouse_activity_upl
			p1q1_spouse_nowkhx p1q1_spouse_nowkhxs1 p1q1_spouse_nowkhxe1);
	if p1q1_ptwkhxs1='16 ye' then p1q1_ptwkhxs1= '16';
	if p1q1_ptwkhxs1='44-49' then p1q1_ptwkhxs1= '44';
	If p1q1_spouse_activity_pl='3 weeks' then p1q1_spouse_activity_pl='21';
	If p1q1_spouse_activity_pl='5 days' then p1q1_spouse_activity_pl='5';
	If p1q1_spouse_activity_upl='1 week' then p1q1_spouse_activity_upl='7';

*To increase consistency of missing data representing accronyms;
	If strip(p1q1_nowk_mnths) = "NAS" then p1q1_nowk_mnths = "NASK";
	If p1q1_activity_pl = "ASK"  then p1q1_activity_pl = "ASKU"; 
	If p1q1_nowk_yrs = "NAS"  then p1q1_nowk_yrs = "NASK"; 
	If p1q1_ptwkhx = "N/A" then p1q1_ptwkhx = "NA"; 
	If p1q1_ptwkhxe1 = "AS" then p1q1_ptwkhxe1 = "ASKU";
	If p1q1_spouse_nowkhxs1 = "AS" then p1q1_spouse_nowkhxs1 = "ASKU"; 
	If p1q1_spouse_nowkhxe1 = "AS" then p1q1_spouse_nowkhxs1 = "ASKU"; 
	If p1q1_spouse_nowkhx = "N/A" then p1q1_spouse_nowkhx = "NA"; 
	If p1q1_spouse_ptwkhx = "N/A" then p1q1_spouse_ptwkhx = "NA"; 
	If p1q1_otherdebt_spec="0" then p1q1_otherdebt_spec="No other debt";
	Run;

	Proc freq data=Consistent; table p1q1_nowk_mnths p1q1_activity_pl p1q1_nowk_yrs p1q1_ptwkhxs1 p1q1_ptwkhxe1 p1q1_spouse_activity_pl p1q1_spouse_activity_upl
			p1q1_spouse_nowkhx p1q1_spouse_nowkhxs1 p1q1_spouse_nowkhxe1; Run;

 

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

When you have multiple variables that may need the same "fix" or things like the values that you want numeric that have occasional characters that make them not numeric values that you can fix then you might consider custom informats.

 

I'm not sure that having at least 3 different values such as "NASK" "NA" and "ASKU" are improving consistency of "missing" values though.

Your example data step is just so wrong I'm not going to attempt to demonstrate.

Place a simple . in the position of any value not actually there.

Or copy lines exactly as they appear from your source file. I doubt if it actually has single spaces but what you posted does.

 

If you mean change type of variable with the same name SAS will not let you do that. You can create a new variable, drop the old one and rename the new one.

This shows an example of reading a variable as one type, creating another variable of the other type and renaming on the output data set options:

data junk (rename=(anum=a bchar=b));
   input a $ b;
   anum= input(a,f3.);
   bchar= put(b,best5. -L);
   drop a b;
datalines;
123 456
;

or from an existing data set

data morejunk ;
   set junk (rename=(a=anum b=bchar));
   a= put(anum,f3. -L);
   b= input(bchar,f5.);
   drop anum bchar;
run;

View solution in original post

4 REPLIES 4
Oligolas
Barite | Level 11

Hi,

first check your import dataset, I doubt you want to import it the way it is currently set up.

second detail your problem, what 'categories' are you talking about, what do you have and what do you want to achieve?

________________________

- Cheers -

Astounding
PROC Star

Check your data for stray characters that don't belong.  These characters won't appear in the example data, since you re-typed the data with no special characters.  But the original source of the data might, for example, include tab characters introduced by the data entry process.

 

To check, print using the $hex16. format.

ballardw
Super User

When you have multiple variables that may need the same "fix" or things like the values that you want numeric that have occasional characters that make them not numeric values that you can fix then you might consider custom informats.

 

I'm not sure that having at least 3 different values such as "NASK" "NA" and "ASKU" are improving consistency of "missing" values though.

Your example data step is just so wrong I'm not going to attempt to demonstrate.

Place a simple . in the position of any value not actually there.

Or copy lines exactly as they appear from your source file. I doubt if it actually has single spaces but what you posted does.

 

If you mean change type of variable with the same name SAS will not let you do that. You can create a new variable, drop the old one and rename the new one.

This shows an example of reading a variable as one type, creating another variable of the other type and renaming on the output data set options:

data junk (rename=(anum=a bchar=b));
   input a $ b;
   anum= input(a,f3.);
   bchar= put(b,best5. -L);
   drop a b;
datalines;
123 456
;

or from an existing data set

data morejunk ;
   set junk (rename=(a=anum b=bchar));
   a= put(anum,f3. -L);
   b= input(bchar,f5.);
   drop anum bchar;
run;
Wub_SAS
Obsidian | Level 7

Thank you very much; that was helpful advice!

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 922 views
  • 0 likes
  • 4 in conversation