BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
wlierman
Lapis Lazuli | Level 10

I need to classify employment by sector: Services or Goods. I need to fill in the NAICS_sector field by using word or words / phrases in the Place of employment field.  The code is below.  It "works" if it is classifying on one word like "disabled"  or a phrase like  "Not_working" in the Place of employment field.  I used that to fill in the Other_categories field.

 

As I mentioned, I am starting to classify the responses in the Place of Employment field into either Services or Goods.   I decided to start out with

Place of Employment responses related to Education.  One thing to keep in mind is that this data is from contact tracing so if a high school student had covid or quarantined for covid the tracer would put Eagle Crest High School as shown in the following example as Place of Employment.  Obviously high school isn't the place of employment for the person but it does relate back to the service sector.  It is the sector that is being measured.  I've shown some other variations under Place of Employment relating to different types of Educational settings.

 

The headers are

 

NAICS_sector       Person_ID          Place of Employment         Sector_Type       Type_Firm       Other_categories

                              123456             Eagle Crest High School

                              555555             Little Preschool

                              332425             UO

                              1111111             Central Elementary School    

                               345267            Student

 

 

and so on.

 

Rather than classifying manually I have been using coding like

Data SASCDC_2.Arias_NAICS_Classify_A;
   Length NAICS_Sector $20;
   Informat NAICS_Sector $20.;
   Format NAICS_Sector $20.;
   Set SASCDC_2.Arias_NAICS_Classify;
   If _n_ ge 3003; 
   If Place_of_Employment in ("UNEMPLOYED", "NOT WORKING", "NO JOB", "NOT EMPLOYED") Then Other_Categories = "Unemployed___Not_working";  
   If Place_of_Employment in ("RETIRED", "REITRED") Then Other_Categories = "Retired";
   If Place_of_Employment in ("CHILD", "MINOR CHILD", "MINOR", "INFANT", "TODDLER", "BABY", "(MINOR)") Then Other_Categories = "Child_minor";
   If Place_of_Employment in ("WORK FROM HOME", "WORKS FROM HOME", "REMOTE", "AT HOME", "WORKING FROM HOME", "STAY AT HOME", "STAYS HOME", "WORKS REMOTELY", "STAY AT HOME",
                               "STAY AT HOME MOM") Then Other_Categories = "Work_from_home_remotely";
   If Place_of_Employment in ("SELF EMPLOYED", "SELF EMPLOYED(WORKS FROM HOME)", "SELF-EMPLOYED") Then NAICS_Sector = "Other_Services"; 
   If NAICS_Sector in ("Other_Services") Then Sector_Type = "Self_employed"; 
   If Place_of_Employment in ("DISABLED", "ON DISABILITY", "DISABILITY")  Then Other_Categories = "Disabled";
   If Place_of_Employment in ("SCHOOL", "HIGH SCHOOL", "ELEMENTARY", "STUDENT", "EDUCATION", "PRESCHOOL", "ELEMENTARY SCHOOL", "PRIMARY", "HS", "CHRISTIAN SCHOOL",
                              "ACADEMY", "UO", "OSU", "COLLEGE", "UNIVERSITY", "LEARNING", "CHARTER SCHOOL") Then NAICS_Sector = "Services";
run;

This is will do okay if the match is on one word like Student.  However, no matches result on phrases like:  "Central Elementary School" even though Elementary and School are listed in the in clause. 

 

How should the code be modified to capture the Place of Employment that are phrases  like   "Western State College" etc and not just one 

word like "Student" or "UO"?

 

Thanks.

 

 

 

          

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

When looking for values where one word is often combined with others, such as "School" in "High School" "Grade School" "Elementary School" and such you may want to consider if FINDW or INDEX are more to your need:

 

if findw(Place_of_Employment,'SCHOOL','i')> 0 then NAICS_Sector = "Services";

for example.

And don't try to force all comparisons into a single statement. That way lies madness. When your IF statement values might overlap you may want to use "else if". Example a school named "Doctor Johnson Elementary School" could be confused with a search on "Doctor". You would likely not want to assign the category "Doctor" might be in when in a compound with School as above. Order of comparisons is important in such.

 

Having worked with data where people are referencing schools you may find a very iterative approach needed as they don't always spell out "school", may imply it with works like elementary,  i.e. Lincoln Elementary, "junior high","senior high" or any number of abbreviations: HS, JHS, MS just a few.

View solution in original post

2 REPLIES 2
ballardw
Super User

When looking for values where one word is often combined with others, such as "School" in "High School" "Grade School" "Elementary School" and such you may want to consider if FINDW or INDEX are more to your need:

 

if findw(Place_of_Employment,'SCHOOL','i')> 0 then NAICS_Sector = "Services";

for example.

And don't try to force all comparisons into a single statement. That way lies madness. When your IF statement values might overlap you may want to use "else if". Example a school named "Doctor Johnson Elementary School" could be confused with a search on "Doctor". You would likely not want to assign the category "Doctor" might be in when in a compound with School as above. Order of comparisons is important in such.

 

Having worked with data where people are referencing schools you may find a very iterative approach needed as they don't always spell out "school", may imply it with works like elementary,  i.e. Lincoln Elementary, "junior high","senior high" or any number of abbreviations: HS, JHS, MS just a few.

wlierman
Lapis Lazuli | Level 10

Thank you.  The approach that you explain does help.

Appreciate it. I may have some other questions before this step is complete.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 2 replies
  • 474 views
  • 0 likes
  • 2 in conversation