BookmarkSubscribeRSS Feed
bfrye
Calcite | Level 5

This is a problem that I have solved but wonder if someone has an elegant solution?  I suspect there may be a better solution.

I have digits associated with M and B.

DATA Numbers;

  input  MB $ 1 Digit1 $ 3 Digit2 $ 4 Digit3 $ 5  ;

Datalines;

M 733

M 234

M 736

M 627

M 740

B 414

B 802

B 631

B 495

B 588

;

Run;

I am only interested in the number of ‘M’ digits.

I want to know what ‘M’ digit values are greater than ‘B’ or what ‘M’ digit values that do not appear with ‘B’.

I also want to know the count or occurrence.  (At the end, I multiplied by 2 because I wanted to double the count)

After I separated the digits associated with M and B and got counts using Proc Freq, I spent much time looking at ‘holding’ the current record and ‘looking’ forward at the next record to do the math using set point=, etc.  I also looked at Retain and Lag.  Then I looked at Proc Transpose.  Here are my results.

Proc Print Data = Numbers;

Title "Numbers";

Run;

                                            Numbers        15:53 Thursday, November 20, 2014   1

                            Obs    MB    Digit1    Digit2    Digit3

                              1         M        7         3         3

                              2         M        2         3         4

                              3         M        7         3         6

                              4         M        6         2         7

                              5         M        7         4         0

                              6         B        4         1         4

                              7         B        8         0         2

                              8         B        6         3         1

                              9         B        4         9         5

                             10        B        5         8         8

Data M (KEEP= MB Number ) B (KEEP= MB Number );

Set Numbers;

Number =        Digit1;

If MB = "M" Then

Output M; Else Output B;

  Number =        Digit2;

   If MB = "M" Then

Output M; Else Output B;

   Number=        Digit3;

    If MB = "M" Then

Output M; Else Output B;

run;

Data M;

Set M;

If Number > "0";  /*I might have a 1 or 2 digit number in the source data*/

  NM=Number;

Run;

Proc sort data=M;

By   NM MB Number;

Run;

Data B;

Set B;

NB=Number;

Run;

Proc sort data=B;

By  NB MB Number;

Run;

Data SetMB;

Set M B;

By  Number DESCENDING MB ;

Run;

Proc Print Data=SetMB;

Title "Initial Data";

Run;

                                          Initial Data     15:53 Thursday, November 20, 2014   2

                                Obs    MB    Number    NM    NB

                                  1         B       0                       0

                                  2         B       1                       1

                                  3         B       1                       1

                                  4         M       2            2

                                  5         M       2            2

                                  6         B       2                       2

                                  7         M       3            3

                                  8         M       3            3

                                  9         M       3            3

                                 10        M       3            3

                                 11         B       3                       3

                                 12        M       4              4

                                 13        M       4              4

                                 14        B       4                         4

                                 15        B       4                         4

                                 16        B       4                         4

                                 17        B       5                         5

                                 18        B       5                         5

                                 19        M       6             6

                                 20        M       6             6

                                 21        B       6                          6

                                 22        M       7              7

                                 23        M       7              7

                                 24        M       7              7

                                 25        M       7              7

                                 26        B       8                           8

                                 27        B       8                           8

                                 28        B       8                           8

                                 29        B       9                           9

Proc Freq Data=SetMB Noprint;

Tables MB*Number/ Out=Summary (Drop = Percent);

Run;

Proc Sort Data=Summary;

By Number Descending MB ;

Run;

Proc Print Data=Summary;

Title "Summary";

Run;

                                            Summary        15:53 Thursday, November 20, 2014   3

                                  Obs    MB    Number    COUNT

                                    1         B       0         1

                                    2         B       1         2

                                    3         M       2         2

                                    4         B       2         1

                                    5         M       3         4

                                    6         B       3         1

                                    7         M       4         2

                                    8         B       4         3

                                    9         B       5         2

                                   10         M      6         2

                                   11         B       6         1

                                   12         M      7         4

                                   13         B      8         3

                                   14         B      9         1

Proc Transpose Data=Summary out=wide prefix=NBR;

By Number;

ID MB;

Var Count;

Run;

Data Wide;

Set Wide;

If  NBRM > NBRB;

If NBRB < 0 Then NBRB = 0;

Reqd = 2 * (NBRM - NBRB);

Run;

/* From Summary for the Number 2, I have COUNT=2 ‘M’s and COUNT=1 ‘B’ with a difference of 1 but is 2 when multiplied by 2  */

Proc Print Data=Wide;

Var Number Reqd;

Title "Extra Numbers Required";

Run;

                                     Extra Numbers Required                                    4

                                                               15:53 Thursday, November 20, 2014

                                     Obs    Number    Reqd

                                      1            2         2

                                      2            3         6

                                      3            6         2

                                      4            7         8

9 REPLIES 9
Ksharp
Super User

How to compare these digits between ‘M’ and ‘B’  ?

art297
Opal | Level 21

The following could be simplified even further, but it achieves the same result as you showed in your example:

DATA M (keep=number) B (keep=number);

  input  MB $ 1 Digit1 $ 3 Digit2 $ 4 Digit3 $ 5  ;

  array digits(*) dig:;

  do i=1 to dim(digits);

    if digits(i) gt 0 then do;

      number=digits(i);

      if MB='M' then output M;

      else output B;

    end;

  end;

  Datalines;

M 733

M 234

M 736

M 627

M 740

B 414

B 802

B 631

B 495

B 588

;

proc sql;

  create table msum as

    select number,count(number) as mcount

      from M

        group by number

  ;

  create table rsum as

    select number,max(0,count(number)) as rcount

      from B

        group by number

  ;

  create table want as

    select l.number,(mcount-max(0,rcount))*2 as Reqd

      from msum l

        left join rsum r

          on l.number=r.number

            having mcount>rcount

  ;

quit;

rivieralad
Obsidian | Level 7

     Hiya

This is quite easy to do in a single data step by using arrays...see below.

Hope this helps, as it is quite a bit simpler than multiple data steps and procedures.

If you have any questions feel free to send me a message or e-mail.

Cheers

Chris

data want;

* define variables to hold data

* not interested in B numbers if no matching M numbers and don't need M number if 0 so don't need to catch any B number if 0;

* therefore set up variables to hold count of numbers 1 to 9 for B and M;

length

    code

    $1.

    m_numb1 - m_numb9

    b_numb1 - b_numb9

    number

    numb

    reqd

    mnumb

    bnumb

    8.

;

* specify which variables to keep;

keep

    numb

    reqd

    mnumb

    bnumb

;

* label the output variables;

label

    numb = 'Number'

    reqd = 'Required'

    mnumb = 'M Count'

    bnumb = 'B Count'

;

* retain code across source data reads until new value found;

retain

    code

;

* use infile statement so we specify what happens at end of file;

infile datalines eof = gotit;

* define arrays to hold count of numbers for each code;

array m_numb(9) m_numb1 - m_numb9 (9*0);

array b_numb(9) b_numb1 - b_numb9 (9*0);

* read in code and first digit then output code B or code M and number greater than 0;

input  code $1. +1 number 1. @;

if code eq 'B'  and number <> 0 then

    b_numb(number) = b_numb(number) + 1;

else if number <> 0 then

    m_numb(number) = m_numb(number) + 1;

* read in second digit then output code B or code M and number greater than 0;

input number 1. @;

if code eq 'B'  and number <> 0 then

    b_numb(number) = b_numb(number) + 1;

else if number <> 0 then

    m_numb(number) = m_numb(number) + 1;

* read in second digit then output code B or code M and number greater than 0;

input number 1.;

if code eq 'B'  and number <> 0 then

    b_numb(number) = b_numb(number) + 1;

else if number <> 0 then

    m_numb(number) = m_numb(number) + 1;

* make sure we don't process the eof label until required;

return;

gotit:

do i = 1 to 9;

    if b_numb(i) < m_numb(i) then do;

        numb = i;

        reqd = (m_numb(i) - b_numb(i)) * 2;

        mnumb = m_numb(i);

        bnumb = b_numb(i);

        output;

    end;

end;

Datalines;

M 733

M 234

M 736

M 627

M 740

B 414

B 802

B 631

B 495

B 588

;

Run;


NumberRequiredM CountB Count
2221
3641
6221
7840
Haikuo
Onyx | Level 15

The more, the merrier, Here is another solution based on Array:

DATA Numbers;

     input  MB $ 1 Digit1 $ 3 Digit2 $ 4 Digit3 $ 5;

     Datalines;

M 733

M 234

M 736

M 627

M 740

B 414

B 802

B 631

B 495

B 588

;

Run;

data want;

           set numbers end=last;

           array m(0:9) _temporary_;

           array b(0:9) _temporary_;

           array d digit:;

           do over d;

                if mb='M' then

                     m(input(d,1.))+1;

                else b(input(d,1.))+1;

           end;

     if last then

           do _n_=0 to 9;

                reqd=sum(m(_n_),-b(_n_));

                if reqd>0 then

                     do;

                           reqd=reqd*2;

                           number=_n_;

                           output;

                     end;

           end;

     keep number reqd;

run;

bfrye
Calcite | Level 5

Eloquent is the word.

Thanks

MarkWik
Quartz | Level 8

Dear Hai kuo,

What does it take  for an aspiring SAS wannabe programmer to become somebody like you. I have noticed a lot of amazing solutions of yours and many of my friends deem you as the superstar of SAS. Perhaps, are you Mr.Ian Whitlock no.2? or you have overtaken him?. You seem to be gifted. Can you offer any piece of guidance or advice for all those Hai kuo wannabe's to help learn faster such as pointing to materials,docs and beyond?

Dear Art and others, how long has it been since the forum existed, i.e when did it start? It seems easy trace back in SAS L, but here is it possible to search threads by date starting from the very first like 1996 sas-l digest?

Many Thanks indeed,

Mark

art297
Opal | Level 21

Mark: You are off by around 10 years concerning SAS-L. Yes, the current log only goes back to Jan 1996, but the original logs exist on the Marist University listserv. You can find the first log, November 1986, at: http://vm.marist.edu/htbin/wlvgl?L=SAS-L&LOG=LOG8611

The SAS Discussion forum history is also available on-line at:  Browse - Content | Communities on SAS

As you'll see, the first forum post shown comes from March 2006

Haikuo
Onyx | Level 15

Dear Mark,

What a deja vu ! I said something similar to less than 2 years ago, right here on this forum:

Paul dorfman, Ksharp, Data _null_, Art, Tom, FriedEgg, PG, Joe Matise, Net Wooding and many many others on this forum and SAS-L are still way ahead of me in term of SAS skills, but as you can see, I made my progress in a period time that is not too long (<5 years of SAS on hand), and you can do it as well.

Well, now you mentioned Ian, I have to say, if Ian was a SAS professor, I am in 1st grade at best, it still takes me to read 2-3 times before I could comfortably understand his papers, especially those on Macros.

The best way to learn SAS is hanging out here at the forum or SAS-L.

Anyway, I want to thank you for your attention and your kind words, even though I know there is still a long way ahead of me. And thank you for your SAS endeavors.

Haikuo

Ksharp
Super User

OK. If I understood what you mean.


DATA Numbers;
     input  MB $ 1 Digit1 $ 3 Digit2 $ 4 Digit3 $ 5;
     Datalines;
M 733
M 234
M 736
M 627
M 740
B 414
B 802
B 631
B 495
B 588
;
Run;

data _null_;
 if _n_ eq 1 then do;
  declare hash h();
  h.definekey('key');
  h.definedata('key','count');
  h.definedone();
end;
set Numbers end=last;
array x{*} Digit: ;
do i=1 to dim(x);
 key=x{i};
 if MB='M' then do;
   if h.find()=0 then do;count+2;h.replace();end;
    else do;count=2;h.replace();end;
 end;
 else if MB='B' then do;
   if h.find()=0 then do;count+(-2);h.replace();end;
 end;
end;
 if last then h.output(dataset:'want(where=(count gt 0))');
run;
  

Xia Keshan

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 969 views
  • 9 likes
  • 6 in conversation