BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
WendyT
Pyrite | Level 9

e dSAS folks-

I’ve been picking at this awhile, but I'm sure there is a better way.

I have a text field that is composed of concatenated numeric codes that indicate the quality of data.  The individual codes are separated by asterisks, and there is no order.  I am trying to pick out the highest code, and keep or eliminate rows based on it (<220).

Any insight you can give me would be much appreciated.

WendyT

DATA FINAL ; SET HAVE ;

HICODE = 1 ;

  NCODES=COUNTW(CAT_CODES,'*')+1 ;

    DO I = 1 TO NCODES ;

     NEWCODE = (SCAN(CAT_CODES, I, '*')) ;

     HICODE  = MAX(NEWCODE, HICODE) ;

    END ;

IF HICODE GE 220 THEN DELETE ;

DROP I NCODES NEWCODE HICODE CAT_CODES ;

RUN ;

I have something like this

LOCDATEVALCAT_CODES
AAA1-Jul-1517.31*2
AAA2-Jul-152.12*150*7
AAA3-Jul-156.8
AAA4-Jul-152.3211
BBB1-Jul-150.1255*100
BBB2-Jul-153.22*1*151

picking out the highest code

LOCDATEVALHI_CODE
AAA1-Jul-1517.32
AAA2-Jul-152.1150
AAA3-Jul-156.81
AAA4-Jul-152.3211
BBB1-Jul-150.1255
BBB2-Jul-153.2151

final dataset

LOCDATEVAL
AAA1-Jul-1517.3
AAA2-Jul-152.1
AAA3-Jul-156.8
AAA4-Jul-152.3
BBB2-Jul-153.2
1 ACCEPTED SOLUTION

Accepted Solutions
Patrick
Opal | Level 21

Given your "final" dataset you're actually not really interested in the highest value but you want simply to delete all records where you find something higher than 220. If so then below code should work:

data have;

  infile datalines truncover dsd dlm=' ';

  input LOC $ DATE:anydtdte. VAL CAT_CODES:$20.;

  format date date9.;

  datalines;

AAA 1-Jul-15 17.3 1*2

AAA 2-Jul-15 2.1 2*150*7

AAA 3-Jul-15 6.8

AAA 4-Jul-15 2.3 211

BBB 1-Jul-15 0.1 255*100

BBB 2-Jul-15 3.2 2*1*151

;

run;

data want(drop=_:);

  set have;

  do _i=1 by 1;

    _code_val=input(scan(CAT_CODES,_i,'*'), best32.);

    if _code_val>=220 then delete;

    if missing(_code_val) then leave;

  end;

run;

View solution in original post

8 REPLIES 8
ballardw
Super User

I think you may need to go into some more detail about HOW you are using that "highest" code value.

It appears that LOC makes no difference, correct?

Your attempted code of:

IF HICODE GE 200 THEN DELETE ;

doesn't seem to quite match the results you indicate are desired because the AAA 4-Jul-15 record has a code of 211 but is kept.

WendyT
Pyrite | Level 9

Ballardw-

You're absolutely right... I tried to make the example simple, and didn't get it right (now edited to be 220).  The LOC and DATE variables were stand-ins for a group of identification variables.

The general idea is that the lower the code, the better the data. The dataset I'm working with is pre-summarized, and all codes that apply during the summarization period are shown in the CAT_CODES variable.

Thanks!

WendyT

slchen
Lapis Lazuli | Level 10

data want;

   set have;

   if not missing(cat_code)then do;

        num=count(cat_code,'*');

        do i=1 to num;

        hi_code=max(hi_code,scan(cat_code,i,'*'));

        end;

   end;

   if index(cat_code,'*')=0 then hi_code=cat_code;

   if hi_code>200 then delete;

drop i num cat_code hi_code;

run;

DaveShea
Lapis Lazuli | Level 10

Hi Wendy,

There are many ways to skin your cat. I think the excellent answer provided by SLCHEN is going to give you what you need, with one small amendment.

The statement:

do i=1 to num;

should be changed to:

do i=1 to num+1;

Without this minor change, any instances like your first test record, will not necessarily pick out the highest value from the CAT_CODE variable.

e.g CAT_CODE=1*2

The original code will return 1 as the highest value, which is not what you want.

Why is this ? The NUM variable contains the number of <asterisk> characters in CAT_CODE, not the number of codes concatenated together. When NUM=1, you have 2 codes to work through. By using the amended version, you will be able to examine each code in the concatenated CAT_CODE value, irrespective of the order that they appear.

This problem will occur when you have any number of codes greater than 1 in CAT_CODE.

I hope that this helps.

Downunder Dave.

Wellington

ChrisNZ
Tourmaline | Level 20

Slightly more compact.


data HAVE;

  infile cards pad;

  input @15 VAL @21 CAT_CODES $20.;

cards;

AAA 1-Jul-15  17.3  1*2

AAA 2-Jul-15  2.1   2*150*7

AAA 3-Jul-15  6.8

AAA 4-Jul-15  2.3   211

BBB 1-Jul-15  0.1   255*100

BBB 2-Jul-15  3.2   2*1*151

run;

data WANT ;

  set HAVE ;

  do until(missing(CODE));

    I       = sum(I,1);

    CODE    = input(scan(CAT_CODES, I, '*'),?? 32.) ;

    HICODE  = max (0, CODE, HICODE) ;  

  end ;

  if HICODE lt 220;

  keep VAL;

run ;

Patrick
Opal | Level 21

Given your "final" dataset you're actually not really interested in the highest value but you want simply to delete all records where you find something higher than 220. If so then below code should work:

data have;

  infile datalines truncover dsd dlm=' ';

  input LOC $ DATE:anydtdte. VAL CAT_CODES:$20.;

  format date date9.;

  datalines;

AAA 1-Jul-15 17.3 1*2

AAA 2-Jul-15 2.1 2*150*7

AAA 3-Jul-15 6.8

AAA 4-Jul-15 2.3 211

BBB 1-Jul-15 0.1 255*100

BBB 2-Jul-15 3.2 2*1*151

;

run;

data want(drop=_:);

  set have;

  do _i=1 by 1;

    _code_val=input(scan(CAT_CODES,_i,'*'), best32.);

    if _code_val>=220 then delete;

    if missing(_code_val) then leave;

  end;

run;

MadhuKorni
Quartz | Level 8

data want;

set have;

cnt = countc(cat_codes,'*');

do i = 1 to cnt+1;

code = input(scan(cat_codes,i),8.);

if code gt 220 then delete;

end;

keep Loc Date Val;

run;

WendyT
Pyrite | Level 9

Thanks so much to everyone - I learned something new from each one of your answers.  Smiley Happy

I awarded Patrick as 'correct' because his solution ran the fastest on my system.

Wendy

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 1021 views
  • 6 likes
  • 7 in conversation