- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I am looking to create a new variable that is a composite of the first three characters of a string variable (they are names) with an underscore and then an associated numeric variable. I was planning on using the trim function but cannot figure out how to crop the first three letters off of the string variable. any suggestions?
Thanks very much!
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Taking a guess at your data (as you have not put test data in the form of a datastep here so that we have something to work with):
data want; charvar="XYZDEFGH"; numvar=43256; result=catx("_",substr(charvar,1,3),put(numvar,best.)); run;
Substr is the function to get the characters you want, put() is the function to show numbers as text, and catx concatenates text with a delimiter, in this case "_".
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thanks! Full disclosure I'm a student new to SAS so forgive my confusion.
In your example, is XYZDEFGH the name of the variable? I tried it but it
told me the numerical variable was uninitialized.
Thanks!
##- Please type your reply above this line. Simple formatting, no
attachments. -##
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Right of an = is the value, left is the name of the variable.
So the statement
charvar="XYZDEFGH";
Creates a varaible named Charvar and assigns the value of XYZDEFGH.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
What will your rule be if the name only has 1 or 2 characters? Do you need _ to make a fixed length or will 1 or 2 characters followed by a single _ suffice?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Upon inspection, none of the string variables have fewer than 3 characters
so a fixed number should work I think!
Thanks!
##- Please type your reply above this line. Simple formatting, no
attachments. -##
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@smashing If you have an answer please mark the question solved. If you still need help please post some samples of what your observation looks like and what you want as output.
There are probably several methods to do this so if you want alternate solutions you should post sample data anyways.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Ah, OK...was envisioning not copying all of the values in as I worried it would be unyielding. Here's my data--I'm looking to get the first 3 letters of name and lifeex76 together:
Obs name lifeex76 urpop76 indeat70 t1rnd
1 Algeria 55 40.3 150 40
2 Angola 39 17.8 186 18
3 Benin 41 21.5 160 22
4 Botswana 52 12.0 110 12
5 Burkina Faso 42 6.3 185 6
6 Burundi 45 3.0 140 3
7 Cameroon 46 26.9 136 27
8 Cape Verde . 30.2 120 30
9 Central African Rep 42 34.2 160 34
10 Chad 40 15.2 179 15
11 Comoros . 21.3 115 21
12 Congo 44 35.8 110 36
13 Cote d'Ivoire 47 32.2 143 32
14 Djibouti . 68.5 166 69
15 Egypt 54 43.5 170 44
16 Equatorial Guinea . 46.6 173 47
17 Ethiopia 42 9.5 162 10
18 Gabon 46 30.6 147 31
19 Gambia . 16.6 193 17
20 Ghana 51 29.8 117 30
21 Guinea-Bissau . 20.8 173 21
22 Guinea 37 16.3 192 16
23 Kenya 52 12.9 108 13
24 Lesotho 50 10.8 140 11
25 Liberia 49 30.4 132 30
26 Libya 54 46.8 130 47
27 Madagascar 48 16.3 195 16
28 Malawi 42 7.7 197 8
29 Mali 39 16.2 206 16
30 Mauritania 41 19.6 173 20
31 Mauritius 64 43.6 67 44
32 Morocco 54 37.8 138 38
33 Mozambique, People's Rep 43 8.6 175 9
34 Namibia 50 . . .
35 Niger 40 10.6 176 11
36 Nigeria 46 23.4 172 23
37 Rwanda 45 4.0 140 4
38 Senegal 41 34.2 168 34
39 Sierra Leone 36 21.1 204 21
40 Somalia 42 25.6 162 26
41 South Africa 55 50.5 120 51
42 Sudan 44 18.9 156 19
43 Swaziland . 14.0 147 14
44 Tanzania 48 10.1 135 10
45 Togo 47 15.8 141 16
46 Tunisia 58 47.6 138 48
47 Uganda 48 8.3 118 8
48 Zaire 47 32.2 137 32
49 Zambia 48 36.3 115 36
50 Zimbabwe 53 19.4 101 19
51 Barbados . 38.6 33 39
52 Canada 74 75.6 21 76
53 Costa Rica 69 42.2 66 42
54 Cuba 72 64.2 49 64
55 Dominican Rep 61 45.3 105 45
56 El Salvador 58 40.4 112 40
57 Guatemala 55 37.1 108 37
58 Haiti 50 22.1 172 22
59 Honduras 56 32.3 123 32
60 Jamaica 68 44.1 45 44
61 Mexico 64 62.8 79 63
62 Nicaragua 55 50.3 115 50
63 Panama 68 49.1 52 49
64 Trinidad and Tobago 67 48.4 41 48
65 United States 72 73.6 22 74
66 Argentina 68 80.6 56 81
67 Bolivia 48 41.5 157 42
68 Brazil 61 61.8 100 62
69 Chile 65 78.3 95 78
70 Colombia 61 60.8 74 61
71 Ecuador 60 42.4 107 42
72 Guyana . 29.6 56 30
73 Paraguay 66 39.0 67 39
74 Peru 56 61.4 126 61
75 Suriname . 44.8 55 45
76 Uruguay 69 83.0 48 83
77 Venezuela 67 77.8 60
78
Thanks for all your help!
Show quoted text
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
You didn't post what you wanted as output, but a CATT should work.
want = CATT(substr(name, 1, 3), lifeex76);
CATT converts numerics to character and SUBSTR takes the first 3 characters of the name.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hey,
That almost works but I need an underscore between the two values and adding a , '_' , doesn't seem to work. It says substr has too many arguments. Any suggestions?
Thanks!
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
This is why you show the output you want...
CATX instead.
CATX("_", substr..., othervariable)
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Your data has missing values for lifeex76 for observations 8, 11, 14, 16 and others. What is the rule for those? Your current rule would possibly yield something like Cap_. for Cape Verde.
Also your observation 56 would yield "El _58" with a space betwee the "l" and the "_". Is that going to be the desired result?
You may want to provide some examples of how the constructed variable is to be used to get better suggestions.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
In terms of the missing, I'm converting missing values to NA with:
if lifeex76 eq . then l
lifeex76='NA';
(It's what is requested in my homework)
Thanks!
##- Please type your reply above this line. Simple formatting, no
attachments. -##
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Incidentially, my attempt to replace . with NA doesn't seem to work--getting this outpu:
name=Guinea-Bissau lifeex76=. urpop76=20.8 indeat70=173 _ERROR_=1 _N_=21
NOTE: Invalid numeric data, 'NA' , at line 175 column 32.
name=Swaziland lifeex76=. urpop76=14 indeat70=147 _ERROR_=1 _N_=43
NOTE: Invalid numeric data, 'NA' , at line 175 column 32.
name=Barbados lifeex76=. urpop76=38.6 indeat70=33 _ERROR_=1 _N_=51
NOTE: Invalid numeric data, 'NA' , at line 175 column 32.
name=Guyana lifeex76=. urpop76=29.6 indeat70=56 _ERROR_=1 _N_=72
NOTE: Invalid numeric data, 'NA' , at line 175 column 32.
name=Suriname lifeex76=. urpop76=44.8 indeat70=55 _ERROR_=1 _N_=75
Any suggestions?
Thanks!
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
SAS will not allow character values for numeric variables. So "NA" being letters do not work. However SAS does provide for the display of special text for a missing value by use of a custom display format.
run this:
Proc format library=work;
value Lifeex
. = 'NA'
;
run;
In your data step add the line:
Format Lifeex76 Lifeex. ;
Then the value NA will appear for missing values. You will either need to run the Proc Format code every session or dig into the field of pernament libraries, format catalogs and format search paths.
You should provide what you want for the EL Salavador data.
It may be you are looking for something like
result = catx('_',substr(scan(name,1),1,3),put(lifeex76,lifeex.));