Hi all
I hope everyone is healthy!
Here is part of my data set
Not Hispanic or Latino | Albany County, NY (36001) | American Indian or Alaska Native | 1,370 | 300 |
Not Hispanic or Latino | Albany County, NY (36001) | Asian | 62,742 | 4,520 |
Not Hispanic or Latino | Albany County, NY (36001) | Black or African American | 116,750 | 11,993 |
Not Hispanic or Latino | Albany County, NY (36001) | Native Hawaiian or Other Pacific Islander | 452 | 12 |
Not Hispanic or Latino | Albany County, NY (36001) | White | 652,841 | 137,877 |
Not Hispanic or Latino | Albany County, NY (36001) | More than one race | 22,848 | 1,135 |
Not Hispanic or Latino | Albany County, NY (36001) | Total | ##### | ##### |
Not Hispanic or Latino | Allegany County, NY (36003) | American Indian or Alaska Native | 295 | 62 |
Not Hispanic or Latino | Allegany County, NY (36003) | Asian | 1,701 | 115 |
Not Hispanic or Latino | Allegany County, NY (36003) | Black or African American | 1,709 | 79 |
Not Hispanic or Latino | Allegany County, NY (36003) | Native Hawaiian or Other Pacific Islander | 38 | 3 |
Not Hispanic or Latino | Allegany County, NY (36003) | White | 130,012 | 26,328 |
Not Hispanic or Latino | Allegany County, NY (36003) | More than one race | 1,632 | 121 |
Not Hispanic or Latino | Allegany County, NY (36003) | Total | ##### | 26,708 |
I want to make this part "Albany County, NY (countynumber") into two variables
county name (e.g. "Albany")
and
county number (e.g. 36001)
Is that possible?
Second, one row is "Native Hawaiian or Other Pacific Islander" and another row is "Asian".
I'd like to add them together to make a row "Asian/Pacific Islander", and then remove the two component rows.
Thanks
Gene
Not sure if this will scale but based on your post:
CountyName = scan(variable, 1, ",");
CountyNumber = scan(variable, 2, "()");
race_combined = race;
if race_combined in ('Asian', 'Native ...') then race_combined = 'Asian/Pacific Islander';
To combine your rows, you'd need to summarize your data - you can use PROC MEANS/SUMMARY for that. The summary stat is most likely sum, but may depend on the data. You didn't include variable names so I made guesses.
I'm going to assume you can modify the code with the correct variable names.
If you have issues, post your code, log and explain the issue.
It might be nice to include your existing variable names.
The county bits can be solved with the SCAN function mostly. The key is setting the exact characters used for delimiters.
Tranwrd removes the word County from the result. I hope you don't have a "County County, NY ".
data example; County="Albany County, NY (36001)"; length countyname $ 20; countyname = tranwrd(scan(county,1,','),'County',''); countynum = scan(county,2,'()'); run;
For your second bit about the race coding my preference would be to leave the values as they are and use a custom format to create text for the combined group.
Proc format; value $myracecode "Asian","Native Hawaiian or Other Pacific Islander" ="Asian/Pacific Islander" ; run;
The use the format with the value, example:
Proc freq data=have; tables race; format race $myracecode. ; run;
Why a format you may ask. First as soon as you remove or replace the values someone will ask for a report or analysis separating the two, especially as these are standard, more or less, values. So if you remove them then you get to go back in your code and an undo that. With the format just remove the format (or use a different format).
The groups created by a format will be honored by reporting, analysis and graphing procedures.
A comment: Inclusion of "Total" or similar summary rows inside a data set is not a good idea for many purposes. It is extremely easy to accidentally include the "total" is some other summary, such as a statewide total and get very incorrect results.
I would almost say this is the result of another procedure's output and would suggest looking if that is the case what the underlying data is.
If you don't want the word COUNTY you'll want to use mine and then substr it to remove the COUNTY. Many counties in my experience do have two or three names, not just one.
Or map the county number to the name using one of the datasets from SASMAPS library.
For separating out the county name and ID, Ballardw's method worked for me
countyname = tranwrd(scan(county,1,','),'County','');
I'll look at the other part now, collapsing two groups.
Thanks
Posting data in usable form makes it much easier to suggest solutions. The following code implements the idea of @Reeza to use a map dataset:
data work.CountyFmt;
set mapsgfk.us_all_attr(keep= id idname rename=(idname = Label));
length Start $ 10;
retain FmtName '$CountyName';
Start = compress(id,, 'kd');
drop id;
run;
proc format cntlin=work.CountyFmt;
run;
data work.want;
set work.have;
length CountyName $ 55 CountyNumber $ 10;
CountyNumber = compress(County,, 'kd');
CountyName = put(CountyNumber, $CountyName.);
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.