BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
kcvaldez98
Obsidian | Level 7

Hi SAS coders,

 

I am working on this dataset where I am trying to extract the county estimates from a variable string. I've tried using all the extraction functions (SCAN, SUBSTR, etc.) and when I do extract the estimate, it always shows some counties including "Esti" instead of the percentage. 

 

 

Screen Shot 2023-10-27 at 10.08.26 PM.png

 

Here is a screenshot to show what I mean. I want to be able to extract all the estimates without this error. I know it's because there are extra spaces between the county names, but I don't know how to extract without the spaces ruining the output.

 

Here is a dataset that has county names and estimates and with the counties with the extra space.

*This string is named Regional*

LARIMER County/Regional Estimate 14% (95% C.I.: 11.9 - 16.2)
LAS ANIMASCounty/Regional Estimate 16.7% (95% C.I.: 9.6 - 23.8)
FREMONT County/Regional Estimate 15.5% (95% C.I.: 10.3 - 20.7)
CONEJOS County/Regional Estimate 9.9% (95% C.I.: 2.6 - 17.1)
EAGLE County/Regional Estimate 14.5% (95% C.I.: 9.9 - 19.1)
LA PLATA County/Regional Estimate 25.1% (95% C.I.: 19.7 - 30.5)

The code I have used to show that specific output is 

DATA want;
    SET Have;

LENGTH     CountyEstimate $4.;
CountyEstimate       =COMPRESS(SCAN(Regional,5, 'e  %'));
RUN;

 

Please advise.

1 ACCEPTED SOLUTION

Accepted Solutions
Patrick
Opal | Level 21

Your code is always extracting the 5th word which of course can't work if county names can consist of one or several words.

This has already been dealt with in proposed answer in your last question here and variations of the answers provided should also work for the case at hand.

 

If you would deal with a string where the sub-strings you're after are always on the same position (like always the second word) then your current code would work.

 

If you use the scan() function with counting words from end to start then you're actually dealing with such a consistent structure. For your current case the following should work:

data have;
  infile datalines truncover;
  input regional $200.;
  datalines;
LARIMER County/Regional Estimate 14% (95% C.I.: 11.9 - 16.2)
LAS ANIMASCounty/Regional Estimate 16.7% (95% C.I.: 9.6 - 23.8)
FREMONT County/Regional Estimate 15.5% (95% C.I.: 10.3 - 20.7)
CONEJOS County/Regional Estimate 9.9% (95% C.I.: 2.6 - 17.1)
EAGLE County/Regional Estimate 14.5% (95% C.I.: 9.9 - 19.1)
LA PLATA County/Regional Estimate 25.1% (95% C.I.: 19.7 - 30.5)
;

data want;
  set have;
  length CountyEstimate 8.;
  format CountyEstimate percent8.1;
  CountyEstimate = input(scan(strip(regional),-6,' '),percent.);
run;

proc print data=want;
run;

 

View solution in original post

5 REPLIES 5
PaigeMiller
Diamond | Level 26

A two step process:

 

  1. Use the FINDW() function to find the location of the word Estimate.
  2. Use the SCAN and SUBSTR function to find first "word" after the location found above plus 9 characters. This assumes there is always a space after the word Estimate.

Example:

 

data have;
    string='LA PLATA County/Regional Estimate 25.1% (95% C.I.: 19.7 - 30.5)';
run;

data want;
    set have;
    where=findw(string,'Estimate',' ');
    estimate=scan(substr(string,where+9),1,' %');
run;

 

Feel free to combine the two steps into one, if you would like. Feel free to turn the answer into a numeric variable, if you want.

--
Paige Miller
Patrick
Opal | Level 21

Your code is always extracting the 5th word which of course can't work if county names can consist of one or several words.

This has already been dealt with in proposed answer in your last question here and variations of the answers provided should also work for the case at hand.

 

If you would deal with a string where the sub-strings you're after are always on the same position (like always the second word) then your current code would work.

 

If you use the scan() function with counting words from end to start then you're actually dealing with such a consistent structure. For your current case the following should work:

data have;
  infile datalines truncover;
  input regional $200.;
  datalines;
LARIMER County/Regional Estimate 14% (95% C.I.: 11.9 - 16.2)
LAS ANIMASCounty/Regional Estimate 16.7% (95% C.I.: 9.6 - 23.8)
FREMONT County/Regional Estimate 15.5% (95% C.I.: 10.3 - 20.7)
CONEJOS County/Regional Estimate 9.9% (95% C.I.: 2.6 - 17.1)
EAGLE County/Regional Estimate 14.5% (95% C.I.: 9.9 - 19.1)
LA PLATA County/Regional Estimate 25.1% (95% C.I.: 19.7 - 30.5)
;

data want;
  set have;
  length CountyEstimate 8.;
  format CountyEstimate percent8.1;
  CountyEstimate = input(scan(strip(regional),-6,' '),percent.);
run;

proc print data=want;
run;

 

kcvaldez98
Obsidian | Level 7

Thank you, that makes a lot more sense. Also to add, how do I format it without the % at the end?

 

Thank you!

PaigeMiller
Diamond | Level 26

@kcvaldez98 wrote:

Thank you, that makes a lot more sense. Also to add, how do I format it without the % at the end?


Remove or change the FORMAT statement.

--
Paige Miller
Patrick
Opal | Level 21

@kcvaldez98 wrote:

Thank you, that makes a lot more sense. Also to add, how do I format it without the % at the end?

 

Thank you!


Below some options. Variable option_2 is the same than in the previous answer just without the percent format applied.

Given your source values are in percent I consider option_2 as "correct" - but that's of course just how I see things.

data want;
  set have;
  length option_1 $6;
  option_1 = scan(strip(regional),-6,' %');

  /*format option_2 percent8.1;*/
  option_2 = input(scan(strip(regional),-6,' '),percent.);

  option_3 = option_2*100;

  option_4 = input(scan(strip(regional),-6,' %'),best32.);
run;

proc print data=want;
run;

Patrick_0-1698878850034.png

 

SAS INNOVATE 2024

Innovate_SAS_Blue.png

Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.

If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website. 

Register now!

From SAS Users blog
Want more? Visit our blog for more articles like these.
5 Steps to Your First Analytics Project Using SAS

For SAS newbies, this video is a great way to get started. James Harroun walks through the process using SAS Studio for SAS OnDemand for Academics, but the same steps apply to any analytics project.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 585 views
  • 2 likes
  • 3 in conversation