DATA Step, Macro, Functions and more

Help with conditional count

Accepted Solution Solved
Reply
Contributor
Posts: 40
Accepted Solution

Help with conditional count

I have two data tables Table A and Table B as follows:

data tableA;

input Days;

cards;

1

2

3

4 ;

run;

data tableB;

input DayCount;

cards;

2

1

2

3

4 ;

run;

I want to create a resulting data table (Table C) such a way that data table contains the number of times the value in Table B exceeds each row in Table A.

data tableC;

input Days Count;

cards;

1  5

2  4

3  2

4  1 ;

run;

The first column of Table C is from Table A. The second column contains how many in Table B exceeds the column Day. For example, 5 days have value greater than 1 (in Table B). 4 days have value greater than 2 in Table B and so on.

Can someone help with an appropriate DATA STEP or PROC SQL to obtain Table C? Thank you.


Accepted Solutions
Solution
‎11-19-2011 07:13 AM
Super User
Posts: 10,035

Help with conditional count

How about.

But if you care about speed, then Hash Table is a better choice, That will need some more code.

data tableA;
input Days;
cards;
1
2
3
4 
;
run;

 

data tableB;
input DayCount;
cards;
2
1
2
3
4 
;
run;

data tableC(drop=daycount);
 set tablea;
 count=0;
 do i=1 to _nobs;
  set tableb nobs=_nobs point=i;
  if daycount ge days then count+1;
 end;
 run;

Ksharp

View solution in original post


All Replies
Solution
‎11-19-2011 07:13 AM
Super User
Posts: 10,035

Help with conditional count

How about.

But if you care about speed, then Hash Table is a better choice, That will need some more code.

data tableA;
input Days;
cards;
1
2
3
4 
;
run;

 

data tableB;
input DayCount;
cards;
2
1
2
3
4 
;
run;

data tableC(drop=daycount);
 set tablea;
 count=0;
 do i=1 to _nobs;
  set tableb nobs=_nobs point=i;
  if daycount ge days then count+1;
 end;
 run;

Ksharp

Contributor
Posts: 40

Help with conditional count

Thanks a lot KSharp. It worked well. I went with the DATA step for the time being. I will look into the hash table version of it since speed is a very important factor.

Super User
Posts: 10,035

Help with conditional count

And version of SQL. IF you like SQL more.

data tableA;
input Days;
cards;
1
2
3
4 
;
run;

 

data tableB;
input DayCount;
cards;
2
1
2
3
4 
;
run;
proc sql;
create table s as
select days,(select count(*) from tableb as b where b.daycount ge a.days) as count
 from tablea as a;quit;

Ksharp

PROC Star
Posts: 7,474

Help with conditional count

KSharp has already provided a number of solutions for accomplishing the results shown in your example, but I did want to point out that your description doesn't match your example's desired results.  I.e., you say that you want the number of records from table b whose value exceeds (i.e., gt) those of table a, but then show that you want the number of values that are greater than or equal to the value (i.e., ge).  Using the wrong comparison could easily end up causing wrong decisions to be made when the results are applied in the real world.

Contributor
Posts: 40

Help with conditional count

art297, you are right. It is a typo on my part. Thanks for pointing out.

Super User
Posts: 10,035

Help with conditional count

It is Hash Table version which will be more fast.

data tableA;
input Days;
cards;
1
2
3
4 
;
run;

 

data tableB;
input DayCount;
cards;
2
1
2
3
4 
;
run;


data TableC(drop=daycount);
 if _n_ eq 1 then do;
  if 0 then set tableb;
  declare hash ha(hashexp:20,dataset:'tableb',multidata:'Y');
  declare hiter hi('ha');
   ha.definekey('daycount');
   ha.definedata('daycount');
   ha.definedone();
 end;
set tablea;
count=0;
 do while(hi.next() eq 0);
  if daycount ge days then count+1;
 end;
run;

Ksharp

Valued Guide
Posts: 765

Re: Help with conditional count

hi ... here's another idea

it requires a sort and relies on values in tablea matching values in tableb, but the final data step is really simple ...

proc sort data=tableb;

by daycount;

run;

data fmt / view=fmt;

retain fmtname "howmany";

if last then call symputx('obs',_n_ - 1);

set tableb (rename=(daycount=start)) end=last;

by start;

label = cat(_n_);

if first.start;

run;

proc format cntlin=fmt;

run;

data tablec;

set tablea;

howmany = &obs - input(put(days,howmany.),10.) + 1;

run;

Contributor
Posts: 40

Re: Help with conditional count

Thanks a lot Ksharp and Mike.

Super User
Posts: 10,035

Re: Help with conditional count

I think Mike has a better idea. i.e.

Firstly use proc sort to sort dataset descending. then make a count+1 variable ,keep the unique value of tableB,

and Use HashTable to query the count you want.

It will be faster than my code, when TableB is a large table.

data tableA;
input Days;
cards;
1
2
3
4
9 
;
run;

 

data tableB;
input DayCount;
cards;
2
1
2
3
4 
;
run;

proc sort data=tableb;by descending daycount;run;
data tableb;
 set tableb;
 by daycount notsorted;
 count+1;
 if last.daycount;
run;

data TableC(drop=daycount rc);
 if _n_ eq 1 then do;
  if 0 then set tableb;
  declare hash ha(hashexp:20,dataset:'tableb');
   ha.definekey('daycount');
   ha.definedata('count');
   ha.definedone();
 end;
set tablea;
daycount=days; call missing(count);
rc=ha.find();
run;

Ksharp

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 9 replies
  • 337 views
  • 6 likes
  • 4 in conversation