This is a problem that I have solved but wonder if someone has an elegant solution? I suspect there may be a better solution.
I have digits associated with M and B.
DATA Numbers;
input MB $ 1 Digit1 $ 3 Digit2 $ 4 Digit3 $ 5 ;
Datalines;
M 733
M 234
M 736
M 627
M 740
B 414
B 802
B 631
B 495
B 588
;
Run;
I am only interested in the number of ‘M’ digits.
I want to know what ‘M’ digit values are greater than ‘B’ or what ‘M’ digit values that do not appear with ‘B’.
I also want to know the count or occurrence. (At the end, I multiplied by 2 because I wanted to double the count)
After I separated the digits associated with M and B and got counts using Proc Freq, I spent much time looking at ‘holding’ the current record and ‘looking’ forward at the next record to do the math using set point=, etc. I also looked at Retain and Lag. Then I looked at Proc Transpose. Here are my results.
Proc Print Data = Numbers;
Title "Numbers";
Run;
Numbers 15:53 Thursday, November 20, 2014 1
Obs MB Digit1 Digit2 Digit3
1 M 7 3 3
2 M 2 3 4
3 M 7 3 6
4 M 6 2 7
5 M 7 4 0
6 B 4 1 4
7 B 8 0 2
8 B 6 3 1
9 B 4 9 5
10 B 5 8 8
Data M (KEEP= MB Number ) B (KEEP= MB Number );
Set Numbers;
Number = Digit1;
If MB = "M" Then
Output M; Else Output B;
Number = Digit2;
If MB = "M" Then
Output M; Else Output B;
Number= Digit3;
If MB = "M" Then
Output M; Else Output B;
run;
Data M;
Set M;
If Number > "0"; /*I might have a 1 or 2 digit number in the source data*/
NM=Number;
Run;
Proc sort data=M;
By NM MB Number;
Run;
Data B;
Set B;
NB=Number;
Run;
Proc sort data=B;
By NB MB Number;
Run;
Data SetMB;
Set M B;
By Number DESCENDING MB ;
Run;
Proc Print Data=SetMB;
Title "Initial Data";
Run;
Initial Data 15:53 Thursday, November 20, 2014 2
Obs MB Number NM NB
1 B 0 0
2 B 1 1
3 B 1 1
4 M 2 2
5 M 2 2
6 B 2 2
7 M 3 3
8 M 3 3
9 M 3 3
10 M 3 3
11 B 3 3
12 M 4 4
13 M 4 4
14 B 4 4
15 B 4 4
16 B 4 4
17 B 5 5
18 B 5 5
19 M 6 6
20 M 6 6
21 B 6 6
22 M 7 7
23 M 7 7
24 M 7 7
25 M 7 7
26 B 8 8
27 B 8 8
28 B 8 8
29 B 9 9
Proc Freq Data=SetMB Noprint;
Tables MB*Number/ Out=Summary (Drop = Percent);
Run;
Proc Sort Data=Summary;
By Number Descending MB ;
Run;
Proc Print Data=Summary;
Title "Summary";
Run;
Summary 15:53 Thursday, November 20, 2014 3
Obs MB Number COUNT
1 B 0 1
2 B 1 2
3 M 2 2
4 B 2 1
5 M 3 4
6 B 3 1
7 M 4 2
8 B 4 3
9 B 5 2
10 M 6 2
11 B 6 1
12 M 7 4
13 B 8 3
14 B 9 1
Proc Transpose Data=Summary out=wide prefix=NBR;
By Number;
ID MB;
Var Count;
Run;
Data Wide;
Set Wide;
If NBRM > NBRB;
If NBRB < 0 Then NBRB = 0;
Reqd = 2 * (NBRM - NBRB);
Run;
/* From Summary for the Number 2, I have COUNT=2 ‘M’s and COUNT=1 ‘B’ with a difference of 1 but is 2 when multiplied by 2 */
Proc Print Data=Wide;
Var Number Reqd;
Title "Extra Numbers Required";
Run;
Extra Numbers Required 4
15:53 Thursday, November 20, 2014
Obs Number Reqd
1 2 2
2 3 6
3 6 2
4 7 8
How to compare these digits between ‘M’ and ‘B’ ?
The following could be simplified even further, but it achieves the same result as you showed in your example:
DATA M (keep=number) B (keep=number);
input MB $ 1 Digit1 $ 3 Digit2 $ 4 Digit3 $ 5 ;
array digits(*) dig:;
do i=1 to dim(digits);
if digits(i) gt 0 then do;
number=digits(i);
if MB='M' then output M;
else output B;
end;
end;
Datalines;
M 733
M 234
M 736
M 627
M 740
B 414
B 802
B 631
B 495
B 588
;
proc sql;
create table msum as
select number,count(number) as mcount
from M
group by number
;
create table rsum as
select number,max(0,count(number)) as rcount
from B
group by number
;
create table want as
select l.number,(mcount-max(0,rcount))*2 as Reqd
from msum l
left join rsum r
on l.number=r.number
having mcount>rcount
;
quit;
Hiya
This is quite easy to do in a single data step by using arrays...see below.
Hope this helps, as it is quite a bit simpler than multiple data steps and procedures.
If you have any questions feel free to send me a message or e-mail.
Cheers
Chris
data want;
* define variables to hold data
* not interested in B numbers if no matching M numbers and don't need M number if 0 so don't need to catch any B number if 0;
* therefore set up variables to hold count of numbers 1 to 9 for B and M;
length
code
$1.
m_numb1 - m_numb9
b_numb1 - b_numb9
number
numb
reqd
mnumb
bnumb
8.
;
* specify which variables to keep;
keep
numb
reqd
mnumb
bnumb
;
* label the output variables;
label
numb = 'Number'
reqd = 'Required'
mnumb = 'M Count'
bnumb = 'B Count'
;
* retain code across source data reads until new value found;
retain
code
;
* use infile statement so we specify what happens at end of file;
infile datalines eof = gotit;
* define arrays to hold count of numbers for each code;
array m_numb(9) m_numb1 - m_numb9 (9*0);
array b_numb(9) b_numb1 - b_numb9 (9*0);
* read in code and first digit then output code B or code M and number greater than 0;
input code $1. +1 number 1. @;
if code eq 'B' and number <> 0 then
b_numb(number) = b_numb(number) + 1;
else if number <> 0 then
m_numb(number) = m_numb(number) + 1;
* read in second digit then output code B or code M and number greater than 0;
input number 1. @;
if code eq 'B' and number <> 0 then
b_numb(number) = b_numb(number) + 1;
else if number <> 0 then
m_numb(number) = m_numb(number) + 1;
* read in second digit then output code B or code M and number greater than 0;
input number 1.;
if code eq 'B' and number <> 0 then
b_numb(number) = b_numb(number) + 1;
else if number <> 0 then
m_numb(number) = m_numb(number) + 1;
* make sure we don't process the eof label until required;
return;
gotit:
do i = 1 to 9;
if b_numb(i) < m_numb(i) then do;
numb = i;
reqd = (m_numb(i) - b_numb(i)) * 2;
mnumb = m_numb(i);
bnumb = b_numb(i);
output;
end;
end;
Datalines;
M 733
M 234
M 736
M 627
M 740
B 414
B 802
B 631
B 495
B 588
;
Run;
Number | Required | M Count | B Count |
---|---|---|---|
2 | 2 | 2 | 1 |
3 | 6 | 4 | 1 |
6 | 2 | 2 | 1 |
7 | 8 | 4 | 0 |
The more, the merrier, Here is another solution based on Array:
DATA Numbers;
input MB $ 1 Digit1 $ 3 Digit2 $ 4 Digit3 $ 5;
Datalines;
M 733
M 234
M 736
M 627
M 740
B 414
B 802
B 631
B 495
B 588
;
Run;
data want;
set numbers end=last;
array m(0:9) _temporary_;
array b(0:9) _temporary_;
array d digit:;
do over d;
if mb='M' then
m(input(d,1.))+1;
else b(input(d,1.))+1;
end;
if last then
do _n_=0 to 9;
reqd=sum(m(_n_),-b(_n_));
if reqd>0 then
do;
reqd=reqd*2;
number=_n_;
output;
end;
end;
keep number reqd;
run;
Eloquent is the word.
Thanks
Dear Hai kuo,
What does it take for an aspiring SAS wannabe programmer to become somebody like you. I have noticed a lot of amazing solutions of yours and many of my friends deem you as the superstar of SAS. Perhaps, are you Mr.Ian Whitlock no.2? or you have overtaken him?. You seem to be gifted. Can you offer any piece of guidance or advice for all those Hai kuo wannabe's to help learn faster such as pointing to materials,docs and beyond?
Dear Art and others, how long has it been since the forum existed, i.e when did it start? It seems easy trace back in SAS L, but here is it possible to search threads by date starting from the very first like 1996 sas-l digest?
Many Thanks indeed,
Mark
Mark: You are off by around 10 years concerning SAS-L. Yes, the current log only goes back to Jan 1996, but the original logs exist on the Marist University listserv. You can find the first log, November 1986, at: http://vm.marist.edu/htbin/wlvgl?L=SAS-L&LOG=LOG8611
The SAS Discussion forum history is also available on-line at: Browse - Content | Communities on SAS
As you'll see, the first forum post shown comes from March 2006
Dear Mark,
What a deja vu ! I said something similar to less than 2 years ago, right here on this forum:
Paul dorfman, Ksharp, Data _null_, Art, Tom, FriedEgg, PG, Joe Matise, Net Wooding and many many others on this forum and SAS-L are still way ahead of me in term of SAS skills, but as you can see, I made my progress in a period time that is not too long (<5 years of SAS on hand), and you can do it as well.
Well, now you mentioned Ian, I have to say, if Ian was a SAS professor, I am in 1st grade at best, it still takes me to read 2-3 times before I could comfortably understand his papers, especially those on Macros.
The best way to learn SAS is hanging out here at the forum or SAS-L.
Anyway, I want to thank you for your attention and your kind words, even though I know there is still a long way ahead of me. And thank you for your SAS endeavors.
Haikuo
OK. If I understood what you mean.
DATA Numbers; input MB $ 1 Digit1 $ 3 Digit2 $ 4 Digit3 $ 5; Datalines; M 733 M 234 M 736 M 627 M 740 B 414 B 802 B 631 B 495 B 588 ; Run; data _null_; if _n_ eq 1 then do; declare hash h(); h.definekey('key'); h.definedata('key','count'); h.definedone(); end; set Numbers end=last; array x{*} Digit: ; do i=1 to dim(x); key=x{i}; if MB='M' then do; if h.find()=0 then do;count+2;h.replace();end; else do;count=2;h.replace();end; end; else if MB='B' then do; if h.find()=0 then do;count+(-2);h.replace();end; end; end; if last then h.output(dataset:'want(where=(count gt 0))'); run;
Xia Keshan
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.