BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Guerraje
Quartz | Level 8

I am working on a dataset in which I have just transposed the data from wide to long. My next step is to delete a word. I want to get rid of the word "county" and remove the underscores. I was told the tranwrd and scan function can assist but I am not sure how to build that code. I was wondering if anyone has used these functions in this way before? 

Attached is what my output looks like that I want to change and what my code looks like. 

 

Thank you, 

Screen Shot 2021-11-11 at 5.55.51 PM.png

Screen Shot 2021-11-11 at 5.55.40 PM.png

1 ACCEPTED SOLUTION

Accepted Solutions
Guerraje
Quartz | Level 8
OKAY! it looks great. I finally have the data long instead of wide and have dropped the word county from all of the names. The last thing I am unsure of how to add is how to drop the underscore that is inside some of the county names such as "RIO_GRANDE"

View solution in original post

16 REPLIES 16
Reeza
Super User

Since it's the word county at the start, SUBSTR is easier in this case.
If you leave the third parameter of the SUBSTR function it goes to the end by default. So starting it at 7, truncates out the word county. It may be 6...test it.

CountyName  = substr(_name_, 7);

@Guerraje wrote:

I am working on a dataset in which I have just transposed the data from wide to long. My next step is to delete a word. I want to get rid of the word "county" and remove the underscores. I was told the tranwrd and scan function can assist but I am not sure how to build that code. I was wondering if anyone has used these functions in this way before? 

Attached is what my output looks like that I want to change and what my code looks like. 

 

Thank you, 

Screen Shot 2021-11-11 at 5.55.51 PM.png

Screen Shot 2021-11-11 at 5.55.40 PM.png




Guerraje
Quartz | Level 8

Screen Shot 2021-11-11 at 7.03.40 PM.png

When running that code, this is the error I get. The variable name is _Name_ should that be before the = and county be inside of the parenthesis?  

Tom
Super User Tom
Super User

You have to include the assignment statement in a data step.

What is the name of the dataset that has the variable _NAME_?  What is the name of the dataset you want to create where the values of _NAME_ are changed?  Where do you want to store the value of the new variables names?  Back into the original _NAME_ variable?  Somewhere else?

 

Perhaps:

data want;
  set have;
  if upcase(_name_)=: 'COUNTY' then new_name=substr(_name_,7);
  else new_name=_name_;
run;

 

Perhaps if you explained the whole problem you would get a better answer.  For example it sounds to me like you might want to actual change the NAME of some variables.  Not just change the values stored in some character variable.

Guerraje
Quartz | Level 8

my dataset name is: coimpt.Insurance 

 

so if I make a data step would like be correct? Because I get this error. 

 

Data CountyNames;

Set coimpt.Insurance; 

If upcase (_name_) = 'COUNTY' then Countynew = substr (_name_7);

else Countynew = _name_; 

run; 

Screen Shot 2021-11-11 at 7.55.54 PM.png

I will want to remove the name "county" that is in front of all the county names in the output, and remove the underscores listed within the county names (I pasted what the output Im trying to change looks like above).

Tom
Super User Tom
Super User

Posting text as text instead of photographs makes it much easier to review and respond to your issues.

 

You did not include the comma between the two arguments to the SUBSTR() function.  In addition to the error SAS is reporting (that you do not have a enough arguments) it will also fail because you probably do not have a variable named _NAME_7.  Probably you variable is named _NAME_.

Guerraje
Quartz | Level 8

Screen Shot 2021-11-11 at 8.13.05 PM.png

Screen Shot 2021-11-11 at 8.13.22 PM.png

 After fixing the code, I no longer have errors, however, my output went back to what it was before. Wide instead of long and the word county is still in front of all the actually county names. 

thank you, 

Tom
Super User Tom
Super User

Please show the whole program.

The latest snapshot you posted does not appear to have anything to do with the code you posted before.  Where is the variable called _NAME_?  

 

Guerraje
Quartz | Level 8

PROC TRANSPOSE DATA = coimpt.Insurance OUT= county;


run;
Data CountyNames;

Set coimpt.Insurance; 

If upcase (_NAME_) = 'COUNTY' then Countynew = substr(_NAME_,7);

else Countynew = _NAME_; 
run;

 

I first transpose and that is where the variable _NAME_ is observed. Now I am trying to get rid of the word "coutny" in front of all the actually county names (Adams, Alamosa.. so on). After running the code above that we have been using it, it changes my code back to running wide not long and the variable names change back to what you see in the last photo

Screen Shot 2021-11-11 at 8.56.36 PM.png

andreas_lds
Jade | Level 19

You are using the wrong dataset in the set statement, you should use the result of proc transpose.

Guerraje
Quartz | Level 8

Thank you so much. That fixed my last problem. Now I have created another column. So I have 3 columns of all of the same information now. I am still unable to get rid of the word "county" in front of all of the county names. Thank you for your patience. My updated code after fixing the set statement looks like this:

PROC TRANSPOSE DATA = coimpt.Insurance OUT= county;

run;

 

Data CountyNames;

Set county; 

If upcase (_NAME_) = 'COUNTY' then Countynew = substr(_NAME_,7);

else Countynew = _NAME_; 
run;

Screen Shot 2021-11-12 at 6.44.46 AM.png

Tom
Super User Tom
Super User

You keep posting snippets of your code.  This time you posted text as text instead of a picture of it.  Next learn to use the Insert SAS code button on the menu bar to get a box that will preserve the formatting.

 

Your IF statement is testing if the name is exactly COUNTY instead of testing if it starts with COUNTY.

The colon modifier on the comparison operator will tell SAS to compare only up to the length of the shorter of the two strings.

data CountyNames;
  set county; 
  if upcase(_NAME_) =: 'COUNTY' then Countynew = substr(_NAME_,7);
  else Countynew = _NAME_; 
run;

 

ballardw
Super User

Spelling counts. You are using ='COUNTY' but the values you keep showing pictures of has 'county'. To a computer 'COUNTY' is not the same as 'county' as the letters are different. Spell the text you search for as it appears in your data.

 


@Guerraje wrote:

Thank you so much. That fixed my last problem. Now I have created another column. So I have 3 columns of all of the same information now. I am still unable to get rid of the word "county" in front of all of the county names. Thank you for your patience. My updated code after fixing the set statement looks like this:

PROC TRANSPOSE DATA = coimpt.Insurance OUT= county;

run;

 

Data CountyNames;

Set county; 

If upcase (_NAME_) = 'COUNTY' then Countynew = substr(_NAME_,7);

else Countynew = _NAME_; 
run;

Screen Shot 2021-11-12 at 6.44.46 AM.png


 

Guerraje
Quartz | Level 8

When making "COUNTY" "county" 

my output remains the same. 

Tom
Super User Tom
Super User

If the goal is to make a version of coimpt.Insurance with the names of the variables changed then it might just be easier to generate the code needed for a RENAME statement or RENAME= dataset option.

 

Here is a step by step way to do that.

First get a dataset with just the variable names.  You could use PROC TRANSPOSE for that.

proc transpose data=coimpt.Insurance(obs=0) out=names;
  var _all_;
run;

Note that the OBS=0 dataset option will make transpose just create a list of variables.  The variable names will stored in a variable named _NAME_.  (You could use the NAME= option of PROC TRANSPOSE to use another name for this variable.) The VAR statement with the _ALL_ variable list will insure that the character variables are included.  The default for PROC TRANSPOSE is to only operate on the numeric variables.

 

Now use that list to generate a series of OLD=NEW rename pairs. We can use PROC SQL to generate that into a macro variable.  Let's call the macro variable RENAME.  Let's restrict it just the variables whose name starts with 'county'.  The NLITERAL() function calls are just in case any of your variables do not follow standard SAS naming conventions, for example names that start with a number or have embedded spaces.

proc sql noprint;
select catx('=',nliteral(_name_),nliteral(substr(_name_,7)))
  into :rename separated by ' '
  from names
  where _name_ like 'county%'
;
quit;

You can use code like this to see what the new macro variable has. Also check the automatic macro variable SQLOBS to see how many names were found that started with lowercase county.

%put &=sqlobs;
%put &=rename;

 

Now let's make a new dataset that has the data from the original dataset, but changes the names of those variables.  For the purpose of this example let's call the new dataset WANT.  We can use the RENAME statement in this data step.

data want;
  set coimpt.Insurance ;
  rename &rename;
run;

 

 

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 16. Read more here about why you should contribute and what is in it for you!

Submit your idea!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 16 replies
  • 2158 views
  • 1 like
  • 5 in conversation