BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
vioravis
Calcite | Level 5

I have two data tables Table A and Table B as follows:

data tableA;

input Days;

cards;

1

2

3

4 ;

run;

data tableB;

input DayCount;

cards;

2

1

2

3

4 ;

run;

I want to create a resulting data table (Table C) such a way that data table contains the number of times the value in Table B exceeds each row in Table A.

data tableC;

input Days Count;

cards;

1  5

2  4

3  2

4  1 ;

run;

The first column of Table C is from Table A. The second column contains how many in Table B exceeds the column Day. For example, 5 days have value greater than 1 (in Table B). 4 days have value greater than 2 in Table B and so on.

Can someone help with an appropriate DATA STEP or PROC SQL to obtain Table C? Thank you.

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User

How about.

But if you care about speed, then Hash Table is a better choice, That will need some more code.

data tableA;
input Days;
cards;
1
2
3
4 
;
run;

 

data tableB;
input DayCount;
cards;
2
1
2
3
4 
;
run;

data tableC(drop=daycount);
 set tablea;
 count=0;
 do i=1 to _nobs;
  set tableb nobs=_nobs point=i;
  if daycount ge days then count+1;
 end;
 run;

Ksharp

View solution in original post

9 REPLIES 9
Ksharp
Super User

How about.

But if you care about speed, then Hash Table is a better choice, That will need some more code.

data tableA;
input Days;
cards;
1
2
3
4 
;
run;

 

data tableB;
input DayCount;
cards;
2
1
2
3
4 
;
run;

data tableC(drop=daycount);
 set tablea;
 count=0;
 do i=1 to _nobs;
  set tableb nobs=_nobs point=i;
  if daycount ge days then count+1;
 end;
 run;

Ksharp

vioravis
Calcite | Level 5

Thanks a lot KSharp. It worked well. I went with the DATA step for the time being. I will look into the hash table version of it since speed is a very important factor.

Ksharp
Super User

And version of SQL. IF you like SQL more.

data tableA;
input Days;
cards;
1
2
3
4 
;
run;

 

data tableB;
input DayCount;
cards;
2
1
2
3
4 
;
run;
proc sql;
create table s as
select days,(select count(*) from tableb as b where b.daycount ge a.days) as count
 from tablea as a;quit;

Ksharp

art297
Opal | Level 21

KSharp has already provided a number of solutions for accomplishing the results shown in your example, but I did want to point out that your description doesn't match your example's desired results.  I.e., you say that you want the number of records from table b whose value exceeds (i.e., gt) those of table a, but then show that you want the number of values that are greater than or equal to the value (i.e., ge).  Using the wrong comparison could easily end up causing wrong decisions to be made when the results are applied in the real world.

vioravis
Calcite | Level 5

art297, you are right. It is a typo on my part. Thanks for pointing out.

Ksharp
Super User

It is Hash Table version which will be more fast.

data tableA;
input Days;
cards;
1
2
3
4 
;
run;

 

data tableB;
input DayCount;
cards;
2
1
2
3
4 
;
run;


data TableC(drop=daycount);
 if _n_ eq 1 then do;
  if 0 then set tableb;
  declare hash ha(hashexp:20,dataset:'tableb',multidata:'Y');
  declare hiter hi('ha');
   ha.definekey('daycount');
   ha.definedata('daycount');
   ha.definedone();
 end;
set tablea;
count=0;
 do while(hi.next() eq 0);
  if daycount ge days then count+1;
 end;
run;

Ksharp

MikeZdeb
Rhodochrosite | Level 12

hi ... here's another idea

it requires a sort and relies on values in tablea matching values in tableb, but the final data step is really simple ...

proc sort data=tableb;

by daycount;

run;

data fmt / view=fmt;

retain fmtname "howmany";

if last then call symputx('obs',_n_ - 1);

set tableb (rename=(daycount=start)) end=last;

by start;

label = cat(_n_);

if first.start;

run;

proc format cntlin=fmt;

run;

data tablec;

set tablea;

howmany = &obs - input(put(days,howmany.),10.) + 1;

run;

vioravis
Calcite | Level 5

Thanks a lot Ksharp and Mike.

Ksharp
Super User

I think Mike has a better idea. i.e.

Firstly use proc sort to sort dataset descending. then make a count+1 variable ,keep the unique value of tableB,

and Use HashTable to query the count you want.

It will be faster than my code, when TableB is a large table.

data tableA;
input Days;
cards;
1
2
3
4
9 
;
run;

 

data tableB;
input DayCount;
cards;
2
1
2
3
4 
;
run;

proc sort data=tableb;by descending daycount;run;
data tableb;
 set tableb;
 by daycount notsorted;
 count+1;
 if last.daycount;
run;

data TableC(drop=daycount rc);
 if _n_ eq 1 then do;
  if 0 then set tableb;
  declare hash ha(hashexp:20,dataset:'tableb');
   ha.definekey('daycount');
   ha.definedata('count');
   ha.definedone();
 end;
set tablea;
daycount=days; call missing(count);
rc=ha.find();
run;

Ksharp

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 1196 views
  • 6 likes
  • 4 in conversation