Solved: Help with conditional count

vioravis · Posted 11-19-2011 05:24 AM

I have two data tables Table A and Table B as follows:

data tableA;

input Days;

cards;

1

2

3

4 ;

run;

data tableB;

input DayCount;

cards;

2

1

2

3

4 ;

run;

I want to create a resulting data table (Table C) such a way that data table contains the number of times the value in Table B exceeds each row in Table A.

data tableC;

input Days Count;

cards;

1 5

2 4

3 2

4 1 ;

run;

The first column of Table C is from Table A. The second column contains how many in Table B exceeds the column Day. For example, 5 days have value greater than 1 (in Table B). 4 days have value greater than 2 in Table B and so on.

Can someone help with an appropriate DATA STEP or PROC SQL to obtain Table C? Thank you.

Ksharp · Posted 11-19-2011 07:13 AM

How about.

But if you care about speed, then Hash Table is a better choice, That will need some more code.

data tableA;
input Days;
cards;
1
2
3
4 
;
run;

 

data tableB;
input DayCount;
cards;
2
1
2
3
4 
;
run;

data tableC(drop=daycount);
 set tablea;
 count=0;
 do i=1 to _nobs;
  set tableb nobs=_nobs point=i;
  if daycount ge days then count+1;
 end;
 run;

Ksharp

View solution in original post

Ksharp · Posted 11-19-2011 07:13 AM

How about.

But if you care about speed, then Hash Table is a better choice, That will need some more code.

data tableA;
input Days;
cards;
1
2
3
4 
;
run;

 

data tableB;
input DayCount;
cards;
2
1
2
3
4 
;
run;

data tableC(drop=daycount);
 set tablea;
 count=0;
 do i=1 to _nobs;
  set tableb nobs=_nobs point=i;
  if daycount ge days then count+1;
 end;
 run;

Ksharp

vioravis · Posted 11-19-2011 03:20 PM

Thanks a lot KSharp. It worked well. I went with the DATA step for the time being. I will look into the hash table version of it since speed is a very important factor.

Ksharp · Posted 11-19-2011 07:20 AM

And version of SQL. IF you like SQL more.

data tableA;
input Days;
cards;
1
2
3
4 
;
run;

 

data tableB;
input DayCount;
cards;
2
1
2
3
4 
;
run;
proc sql;
create table s as
select days,(select count(*) from tableb as b where b.daycount ge a.days) as count
 from tablea as a;quit;

Ksharp

art297 · Posted 11-19-2011 11:23 AM

KSharp has already provided a number of solutions for accomplishing the results shown in your example, but I did want to point out that your description doesn't match your example's desired results. I.e., you say that you want the number of records from table b whose value exceeds (i.e., gt) those of table a, but then show that you want the number of values that are greater than or equal to the value (i.e., ge). Using the wrong comparison could easily end up causing wrong decisions to be made when the results are applied in the real world.

vioravis · Posted 11-19-2011 03:21 PM

art297, you are right. It is a typo on my part. Thanks for pointing out.

Ksharp · Posted 11-20-2011 08:56 PM

It is Hash Table version which will be more fast.

data tableA;
input Days;
cards;
1
2
3
4 
;
run;

 

data tableB;
input DayCount;
cards;
2
1
2
3
4 
;
run;


data TableC(drop=daycount);
 if _n_ eq 1 then do;
  if 0 then set tableb;
  declare hash ha(hashexp:20,dataset:'tableb',multidata:'Y');
  declare hiter hi('ha');
   ha.definekey('daycount');
   ha.definedata('daycount');
   ha.definedone();
 end;
set tablea;
count=0;
 do while(hi.next() eq 0);
  if daycount ge days then count+1;
 end;
run;

Ksharp

MikeZdeb · Posted 11-20-2011 10:21 PM

hi ... here's another idea

it requires a sort and relies on values in tablea matching values in tableb, but the final data step is really simple ...

proc sort data=tableb;

by daycount;

run;

data fmt / view=fmt;

retain fmtname "howmany";

if last then call symputx('obs',_n_ - 1);

set tableb (rename=(daycount=start)) end=last;

by start;

label = cat(_n_);

if first.start;

run;

proc format cntlin=fmt;

run;

data tablec;

set tablea;

howmany = &obs - input(put(days,howmany.),10.) + 1;

run;

vioravis · Posted 11-21-2011 07:01 AM

Thanks a lot Ksharp and Mike.

Ksharp · Posted 11-22-2011 01:38 AM

I think Mike has a better idea. i.e.

Firstly use proc sort to sort dataset descending. then make a count+1 variable ,keep the unique value of tableB,

and Use HashTable to query the count you want.

It will be faster than my code, when TableB is a large table.

data tableA;
input Days;
cards;
1
2
3
4
9 
;
run;

 

data tableB;
input DayCount;
cards;
2
1
2
3
4 
;
run;

proc sort data=tableb;by descending daycount;run;
data tableb;
 set tableb;
 by daycount notsorted;
 count+1;
 if last.daycount;
run;

data TableC(drop=daycount rc);
 if _n_ eq 1 then do;
  if 0 then set tableb;
  declare hash ha(hashexp:20,dataset:'tableb');
   ha.definekey('daycount');
   ha.definedata('count');
   ha.definedone();
 end;
set tablea;
daycount=days; call missing(count);
rc=ha.find();
run;

Ksharp

Help with conditional count

Help with conditional count

Help with conditional count

Help with conditional count

Help with conditional count

Help with conditional count

Help with conditional count

Help with conditional count

Re: Help with conditional count

Re: Help with conditional count

Re: Help with conditional count

Classroom Training Available!