DATA Step, Macro, Functions and more

help with picking out numeric codes from a text field

Accepted Solution Solved
Reply
Frequent Contributor
Posts: 91
Accepted Solution

help with picking out numeric codes from a text field

e dSAS folks-

I’ve been picking at this awhile, but I'm sure there is a better way.

I have a text field that is composed of concatenated numeric codes that indicate the quality of data.  The individual codes are separated by asterisks, and there is no order.  I am trying to pick out the highest code, and keep or eliminate rows based on it (<220).

Any insight you can give me would be much appreciated.

WendyT

DATA FINAL ; SET HAVE ;

HICODE = 1 ;

  NCODES=COUNTW(CAT_CODES,'*')+1 ;

    DO I = 1 TO NCODES ;

     NEWCODE = (SCAN(CAT_CODES, I, '*')) ;

     HICODE  = MAX(NEWCODE, HICODE) ;

    END ;

IF HICODE GE 220 THEN DELETE ;

DROP I NCODES NEWCODE HICODE CAT_CODES ;

RUN ;

I have something like this

LOCDATEVALCAT_CODES
AAA1-Jul-1517.31*2
AAA2-Jul-152.12*150*7
AAA3-Jul-156.8
AAA4-Jul-152.3211
BBB1-Jul-150.1255*100
BBB2-Jul-153.22*1*151

picking out the highest code

LOCDATEVALHI_CODE
AAA1-Jul-1517.32
AAA2-Jul-152.1150
AAA3-Jul-156.81
AAA4-Jul-152.3211
BBB1-Jul-150.1255
BBB2-Jul-153.2151

final dataset

LOCDATEVAL
AAA1-Jul-1517.3
AAA2-Jul-152.1
AAA3-Jul-156.8
AAA4-Jul-152.3
BBB2-Jul-153.2

Accepted Solutions
Solution
‎07-23-2015 07:11 AM
Respected Advisor
Posts: 3,900

Re: help with picking out numeric codes from a text field

Given your "final" dataset you're actually not really interested in the highest value but you want simply to delete all records where you find something higher than 220. If so then below code should work:

data have;

  infile datalines truncover dsd dlm=' ';

  input LOC $ DATE:anydtdte. VAL CAT_CODES:$20.;

  format date date9.;

  datalines;

AAA 1-Jul-15 17.3 1*2

AAA 2-Jul-15 2.1 2*150*7

AAA 3-Jul-15 6.8

AAA 4-Jul-15 2.3 211

BBB 1-Jul-15 0.1 255*100

BBB 2-Jul-15 3.2 2*1*151

;

run;

data want(drop=_Smiley Happy;

  set have;

  do _i=1 by 1;

    _code_val=input(scan(CAT_CODES,_i,'*'), best32.);

    if _code_val>=220 then delete;

    if missing(_code_val) then leave;

  end;

run;

View solution in original post


All Replies
Super User
Posts: 10,538

Re: help with picking out numeric codes from a text field

I think you may need to go into some more detail about HOW you are using that "highest" code value.

It appears that LOC makes no difference, correct?

Your attempted code of:

IF HICODE GE 200 THEN DELETE ;

doesn't seem to quite match the results you indicate are desired because the AAA 4-Jul-15 record has a code of 211 but is kept.

Frequent Contributor
Posts: 91

Re: help with picking out numeric codes from a text field

Ballardw-

You're absolutely right... I tried to make the example simple, and didn't get it right (now edited to be 220).  The LOC and DATE variables were stand-ins for a group of identification variables.

The general idea is that the lower the code, the better the data. The dataset I'm working with is pre-summarized, and all codes that apply during the summarization period are shown in the CAT_CODES variable.

Thanks!

WendyT

Super Contributor
Posts: 275

Re: help with picking out numeric codes from a text field

data want;

   set have;

   if not missing(cat_code)then do;

        num=count(cat_code,'*');

        do i=1 to num;

        hi_code=max(hi_code,scan(cat_code,i,'*'));

        end;

   end;

   if index(cat_code,'*')=0 then hi_code=cat_code;

   if hi_code>200 then delete;

drop i num cat_code hi_code;

run;

Contributor
Posts: 71

Re: help with picking out numeric codes from a text field

Hi Wendy,

There are many ways to skin your cat. I think the excellent answer provided by SLCHEN is going to give you what you need, with one small amendment.

The statement:

do i=1 to num;

should be changed to:

do i=1 to num+1;

Without this minor change, any instances like your first test record, will not necessarily pick out the highest value from the CAT_CODE variable.

e.g CAT_CODE=1*2

The original code will return 1 as the highest value, which is not what you want.

Why is this ? The NUM variable contains the number of <asterisk> characters in CAT_CODE, not the number of codes concatenated together. When NUM=1, you have 2 codes to work through. By using the amended version, you will be able to examine each code in the concatenated CAT_CODE value, irrespective of the order that they appear.

This problem will occur when you have any number of codes greater than 1 in CAT_CODE.

I hope that this helps.

Downunder Dave.

Wellington

PROC Star
Posts: 1,566

Re: help with picking out numeric codes from a text field

Slightly more compact.


data HAVE;

  infile cards pad;

  input @15 VAL @21 CAT_CODES $20.;

cards;

AAA 1-Jul-15  17.3  1*2

AAA 2-Jul-15  2.1   2*150*7

AAA 3-Jul-15  6.8

AAA 4-Jul-15  2.3   211

BBB 1-Jul-15  0.1   255*100

BBB 2-Jul-15  3.2   2*1*151

run;

data WANT ;

  set HAVE ;

  do until(missing(CODE));

    I       = sum(I,1);

    CODE    = input(scan(CAT_CODES, I, '*'),?? 32.) ;

    HICODE  = max (0, CODE, HICODE) ;  

  end ;

  if HICODE lt 220;

  keep VAL;

run ;

Solution
‎07-23-2015 07:11 AM
Respected Advisor
Posts: 3,900

Re: help with picking out numeric codes from a text field

Given your "final" dataset you're actually not really interested in the highest value but you want simply to delete all records where you find something higher than 220. If so then below code should work:

data have;

  infile datalines truncover dsd dlm=' ';

  input LOC $ DATE:anydtdte. VAL CAT_CODES:$20.;

  format date date9.;

  datalines;

AAA 1-Jul-15 17.3 1*2

AAA 2-Jul-15 2.1 2*150*7

AAA 3-Jul-15 6.8

AAA 4-Jul-15 2.3 211

BBB 1-Jul-15 0.1 255*100

BBB 2-Jul-15 3.2 2*1*151

;

run;

data want(drop=_Smiley Happy;

  set have;

  do _i=1 by 1;

    _code_val=input(scan(CAT_CODES,_i,'*'), best32.);

    if _code_val>=220 then delete;

    if missing(_code_val) then leave;

  end;

run;

Contributor
Posts: 65

Re: help with picking out numeric codes from a text field

data want;

set have;

cnt = countc(cat_codes,'*');

do i = 1 to cnt+1;

code = input(scan(cat_codes,i),8.);

if code gt 220 then delete;

end;

keep Loc Date Val;

run;

Frequent Contributor
Posts: 91

Re: help with picking out numeric codes from a text field

Thanks so much to everyone - I learned something new from each one of your answers.  Smiley Happy

I awarded Patrick as 'correct' because his solution ran the fastest on my system.

Wendy

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 8 replies
  • 351 views
  • 6 likes
  • 7 in conversation