BookmarkSubscribeRSS Feed
smashing
Calcite | Level 5
Hi there,

I am looking to create a new variable that is a composite of the first three characters of a string variable (they are names) with an underscore and then an associated numeric variable. I was planning on using the trim function but cannot figure out how to crop the first three letters off of the string variable. any suggestions?

Thanks very much!
17 REPLIES 17
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Hi,

 

Taking a guess at your data (as you have not put test data in the form of a datastep here so that we have something to work with):

data want;
  charvar="XYZDEFGH";
  numvar=43256;
  result=catx("_",substr(charvar,1,3),put(numvar,best.));
run;

Substr is the function to get the characters you want, put() is the function to show numbers as text, and catx concatenates text with a delimiter, in this case "_".

smashing
Calcite | Level 5
Hey,

Thanks! Full disclosure I'm a student new to SAS so forgive my confusion.
In your example, is XYZDEFGH the name of the variable? I tried it but it
told me the numerical variable was uninitialized.

Thanks!

##- Please type your reply above this line. Simple formatting, no
attachments. -##
ballardw
Super User

Right of an = is the value, left is the name of the variable.

So the statement

charvar="XYZDEFGH";

 

Creates a varaible named Charvar and assigns the value of XYZDEFGH.

ballardw
Super User

What will your rule be if the name only has 1 or 2 characters? Do you need  _ to make a fixed length or will 1 or 2 characters followed by a single _ suffice?

smashing
Calcite | Level 5
Hey,

Upon inspection, none of the string variables have fewer than 3 characters
so a fixed number should work I think!

Thanks!

##- Please type your reply above this line. Simple formatting, no
attachments. -##
Reeza
Super User

@smashing If you have an answer please mark the question solved. If you still need help please post some samples of what your observation looks like and what you want as output. 

 

There are probably several methods to do this so if you want alternate solutions you should post sample data anyways. 

smashing
Calcite | Level 5

Ah, OK...was envisioning not copying all of the values in as I worried it would be unyielding. Here's my data--I'm looking to get the first 3 letters of name and lifeex76 together:

Obs name lifeex76 urpop76 indeat70 t1rnd
1 Algeria 55 40.3 150 40
2 Angola 39 17.8 186 18
3 Benin 41 21.5 160 22
4 Botswana 52 12.0 110 12
5 Burkina Faso 42 6.3 185 6
6 Burundi 45 3.0 140 3
7 Cameroon 46 26.9 136 27
8 Cape Verde . 30.2 120 30
9 Central African Rep 42 34.2 160 34
10 Chad 40 15.2 179 15
11 Comoros . 21.3 115 21
12 Congo 44 35.8 110 36
13 Cote d'Ivoire 47 32.2 143 32
14 Djibouti . 68.5 166 69
15 Egypt 54 43.5 170 44
16 Equatorial Guinea . 46.6 173 47
17 Ethiopia 42 9.5 162 10
18 Gabon 46 30.6 147 31
19 Gambia . 16.6 193 17
20 Ghana 51 29.8 117 30
21 Guinea-Bissau . 20.8 173 21
22 Guinea 37 16.3 192 16
23 Kenya 52 12.9 108 13
24 Lesotho 50 10.8 140 11
25 Liberia 49 30.4 132 30
26 Libya 54 46.8 130 47
27 Madagascar 48 16.3 195 16
28 Malawi 42 7.7 197 8
29 Mali 39 16.2 206 16
30 Mauritania 41 19.6 173 20
31 Mauritius 64 43.6 67 44
32 Morocco 54 37.8 138 38
33 Mozambique, People's Rep 43 8.6 175 9
34 Namibia 50 . . .
35 Niger 40 10.6 176 11
36 Nigeria 46 23.4 172 23
37 Rwanda 45 4.0 140 4
38 Senegal 41 34.2 168 34
39 Sierra Leone 36 21.1 204 21
40 Somalia 42 25.6 162 26
41 South Africa 55 50.5 120 51
42 Sudan 44 18.9 156 19
43 Swaziland . 14.0 147 14
44 Tanzania 48 10.1 135 10
45 Togo 47 15.8 141 16
46 Tunisia 58 47.6 138 48
47 Uganda 48 8.3 118 8
48 Zaire 47 32.2 137 32
49 Zambia 48 36.3 115 36
50 Zimbabwe 53 19.4 101 19
51 Barbados . 38.6 33 39
52 Canada 74 75.6 21 76
53 Costa Rica 69 42.2 66 42
54 Cuba 72 64.2 49 64
55 Dominican Rep 61 45.3 105 45
56 El Salvador 58 40.4 112 40
57 Guatemala 55 37.1 108 37
58 Haiti 50 22.1 172 22
59 Honduras 56 32.3 123 32
60 Jamaica 68 44.1 45 44
61 Mexico 64 62.8 79 63
62 Nicaragua 55 50.3 115 50
63 Panama 68 49.1 52 49
64 Trinidad and Tobago 67 48.4 41 48
65 United States 72 73.6 22 74
66 Argentina 68 80.6 56 81
67 Bolivia 48 41.5 157 42
68 Brazil 61 61.8 100 62
69 Chile 65 78.3 95 78
70 Colombia 61 60.8 74 61
71 Ecuador 60 42.4 107 42
72 Guyana . 29.6 56 30
73 Paraguay 66 39.0 67 39
74 Peru 56 61.4 126 61
75 Suriname . 44.8 55 45
76 Uruguay 69 83.0 48 83
77 Venezuela 67 77.8 60
78

Thanks for all your help!
Show quoted text
Reeza
Super User

You didn't post what you wanted as output, but a CATT should work.

 

 

want = CATT(substr(name, 1, 3), lifeex76);

 

CATT converts numerics to character and SUBSTR takes the first 3 characters of the name.

smashing
Calcite | Level 5

Hey,

 

That almost works but I need an underscore between the two values and adding a , '_' , doesn't seem to work. It says substr has too many arguments.  Any suggestions?

 

Thanks!

Reeza
Super User

This is why you show the output you want...

 

CATX instead.

 

CATX("_", substr..., othervariable)

ballardw
Super User

Your data has missing values for lifeex76 for observations 8, 11, 14, 16 and others. What is the rule for those? Your current rule would possibly yield something like Cap_. for Cape Verde.

Also your observation 56 would yield "El _58" with a space betwee the "l" and the "_". Is that going to be the desired result?

 

You may want to provide some examples of how the constructed variable is to be used to get better suggestions.

 

 

smashing
Calcite | Level 5
Good catch with the El...yeah, that's not what I want.

In terms of the missing, I'm converting missing values to NA with:

if lifeex76 eq . then l
lifeex76='NA';

(It's what is requested in my homework)

Thanks!

##- Please type your reply above this line. Simple formatting, no
attachments. -##
smashing
Calcite | Level 5

Incidentially, my attempt to replace . with NA doesn't seem to work--getting this outpu:

 

name=Guinea-Bissau lifeex76=. urpop76=20.8 indeat70=173 _ERROR_=1 _N_=21
NOTE: Invalid numeric data, 'NA' , at line 175 column 32.
name=Swaziland lifeex76=. urpop76=14 indeat70=147 _ERROR_=1 _N_=43
NOTE: Invalid numeric data, 'NA' , at line 175 column 32.
name=Barbados lifeex76=. urpop76=38.6 indeat70=33 _ERROR_=1 _N_=51
NOTE: Invalid numeric data, 'NA' , at line 175 column 32.
name=Guyana lifeex76=. urpop76=29.6 indeat70=56 _ERROR_=1 _N_=72
NOTE: Invalid numeric data, 'NA' , at line 175 column 32.
name=Suriname lifeex76=. urpop76=44.8 indeat70=55 _ERROR_=1 _N_=75

 

Any suggestions?

 

Thanks!

ballardw
Super User

SAS will not allow character values for numeric variables. So "NA" being letters do not work. However SAS does provide for the display of special text for a missing value by use of a custom display format.

run this:

Proc format library=work;

value Lifeex

 . = 'NA'

;

run;

 

In your data step add the line:

Format Lifeex76 Lifeex. ;

 

Then the value NA will appear for missing values. You will either need to run the Proc Format code every session or dig into the field of pernament libraries, format catalogs and format search paths.

 

You should provide what you want for the EL Salavador data.

It may be you are looking for something like

result = catx('_',substr(scan(name,1),1,3),put(lifeex76,lifeex.));

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 17 replies
  • 4982 views
  • 0 likes
  • 4 in conversation