BookmarkSubscribeRSS Feed
geneshackman
Pyrite | Level 9

Hi all

 

I hope everyone is healthy!

 

Here is part of my data set

Not Hispanic or Latino Albany County, NY (36001) American Indian or Alaska Native 1,370 300
Not Hispanic or Latino Albany County, NY (36001) Asian 62,742 4,520
Not Hispanic or Latino Albany County, NY (36001) Black or African American 116,750 11,993
Not Hispanic or Latino Albany County, NY (36001) Native Hawaiian or Other Pacific Islander 452 12
Not Hispanic or Latino Albany County, NY (36001) White 652,841 137,877
Not Hispanic or Latino Albany County, NY (36001) More than one race 22,848 1,135
Not Hispanic or Latino Albany County, NY (36001) Total ##### #####
Not Hispanic or Latino Allegany County, NY (36003) American Indian or Alaska Native 295 62
Not Hispanic or Latino Allegany County, NY (36003) Asian 1,701 115
Not Hispanic or Latino Allegany County, NY (36003) Black or African American 1,709 79
Not Hispanic or Latino Allegany County, NY (36003) Native Hawaiian or Other Pacific Islander 38 3
Not Hispanic or Latino Allegany County, NY (36003) White 130,012 26,328
Not Hispanic or Latino Allegany County, NY (36003) More than one race 1,632 121
Not Hispanic or Latino Allegany County, NY (36003) Total ##### 26,708

 

I want to make this part "Albany County, NY (countynumber") into two variables

county name (e.g. "Albany")

and 

county number (e.g. 36001)

 

Is that possible?

 

Second, one row is "Native Hawaiian or Other Pacific Islander" and another row is "Asian".

I'd like to add them together to make a row "Asian/Pacific Islander", and then remove the two component rows.

 

Thanks

 

Gene

 

 

8 REPLIES 8
Reeza
Super User

Not sure if this will scale but based on your post:

 

CountyName = scan(variable, 1, ",");
CountyNumber = scan(variable, 2, "()");

race_combined = race;
if race_combined in ('Asian', 'Native ...') then race_combined = 'Asian/Pacific Islander';

To combine your rows, you'd need to summarize your data - you can use PROC MEANS/SUMMARY for that. The summary stat is most likely sum, but may depend on the data. You didn't include variable names so I made guesses.

geneshackman
Pyrite | Level 9
Thanks Reeza, I forgot. Yes, variable names would be Ethnicity, County, Race, Total, Age65Plus
Reeza
Super User

I'm going to assume you can modify the code with the correct variable names. 

If you have issues, post your code, log and explain the issue.

ballardw
Super User

It might be nice to include your existing variable names.

The county bits can be solved with the SCAN function mostly. The key is setting the exact characters used for delimiters.

Tranwrd removes the word County from the result. I hope you don't have a "County County, NY ".

data example;
   County="Albany County, NY (36001)";
   length countyname $ 20;
   countyname = tranwrd(scan(county,1,','),'County','');
   countynum  = scan(county,2,'()');
run;

For your second bit about the race coding my preference would be to leave the values as they are and use a custom format to create text for the combined group.

Proc format;
value $myracecode
"Asian","Native Hawaiian or Other Pacific Islander" ="Asian/Pacific Islander"
;
run;

The use the format with the value, example:

Proc freq data=have;
   tables race;
   format race $myracecode. ;
run;

Why a format you may ask. First as soon as you remove or replace the values someone will ask for a report or analysis separating the two, especially as these are standard, more or less, values. So if you remove them then you get to go back in your code and an undo that. With the format just remove the format (or use a different format).

The groups created by a format will be honored by reporting, analysis and graphing procedures.

 

A comment: Inclusion of "Total" or similar summary rows inside a data set is not a good idea for many purposes. It is extremely easy to accidentally include the "total" is some other summary, such as a statewide total and get very incorrect results.

I would almost say this is the result of another procedure's output and would suggest looking if that is the case what the underlying data is.

geneshackman
Pyrite | Level 9
Ballardw, your solution to splitting the county name works. I'll try the remaining part tomorrow. Reeza your part almost works. I'm not clear if I indicated that I didn't want the word "County" in the output, just the name of the county.

Thanks both.
Reeza
Super User

If you don't want the word COUNTY you'll want to use mine and then substr it to remove the COUNTY. Many counties in my experience do have two or three names, not just one. 

Or map the county number to the name using one of the datasets from SASMAPS library. 

geneshackman
Pyrite | Level 9

For separating out the county name and ID, Ballardw's method worked for me

countyname = tranwrd(scan(county,1,','),'County','');

 

I'll look at the other part now, collapsing two groups.

 

Thanks

andreas_lds
Jade | Level 19

Posting data in usable form makes it much easier to suggest solutions. The following code implements the idea of @Reeza to use a map dataset:

data work.CountyFmt;
   set mapsgfk.us_all_attr(keep= id idname rename=(idname = Label));
   length Start $ 10;
   retain FmtName '$CountyName';

   Start = compress(id,, 'kd');

   drop id;
run;

proc format cntlin=work.CountyFmt;
run;

data work.want;
   set work.have;
   
   length CountyName $ 55 CountyNumber $ 10;

   CountyNumber = compress(County,, 'kd');
   CountyName = put(CountyNumber, $CountyName.);
run;

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 8 replies
  • 1426 views
  • 4 likes
  • 4 in conversation