BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Luciferene
Obsidian | Level 7

Hello community,

 

I'm very new to SAS and I've been struggling to efficiently do this.

 

So following are examples of the datasets:

 

Dataset A:

A 0.86

B 1.25

C 1.01

D 1.35

 

Dataset B (Columns: Category 1 Category 2):

A A

A B

B B

A A

C C

A D

D B

A D

 

Then I want to Dataset A to look like, or create a new Dataset C that looks like (With columns being Classes, rates, #Cat 1 from Dataset A, #Cat 2 from Dataset B):

A 0.86 5 2

B 1.25 1 3

C 1.01 1 1

D 1.35 1 2

 

Currently I'm using One Way Freq to do the count for Cat 1 and Cat 2 for Dataset B. And combining frequency column from those two with Data A and it's very messy.

 

Thank you in advance for help!

1 ACCEPTED SOLUTION

Accepted Solutions
novinosrin
Tourmaline | Level 20

@Luciferene wrote:

I'm only "counting from 2 categories" but dataset b does have many other columns that are not being counted.

data A;
input Cat1 $ n ;
cards;
A 0.86
B 1.25
C 1.01
D 1.35
;

data b;
input  Cat1 $ Cat2 $;
cards;
A A
A B
B B
A A
C C
A D
D B
A D
;

data _null_;
if _n_=1 then do;
if 0 then set A;
  dcl hash H (ordered:'y') ;
   h.definekey  ("Cat1") ;
   h.definedata ("Cat1","n","freq1","freq2") ;
   h.definedone () ;
end;
call missing(freq1,freq2);
do until(lr);
set a end=lr;
rc=h.add();
end;
lr=0;
do until(lr);
set b end=lr;
if h.find(key:Cat1) ne 0 then freq1=1;else freq1=sum(freq1,1);
h.replace();
if h.find(key:Cat2) ne 0 then freq2=1;else freq2=sum(freq2,1);
h.replace();
end;
h.output(dataset:'want');
stop;
run;

View solution in original post

17 REPLIES 17
novinosrin
Tourmaline | Level 20

Is it just 2 categories in dataset b or the number of categories in b could vary?

Luciferene
Obsidian | Level 7

I'm only counting from 2 categories, but dataset b does have many other columns that are not being counted.

novinosrin
Tourmaline | Level 20

@Luciferene wrote:

I'm only "counting from 2 categories" but dataset b does have many other columns that are not being counted.

data A;
input Cat1 $ n ;
cards;
A 0.86
B 1.25
C 1.01
D 1.35
;

data b;
input  Cat1 $ Cat2 $;
cards;
A A
A B
B B
A A
C C
A D
D B
A D
;

data _null_;
if _n_=1 then do;
if 0 then set A;
  dcl hash H (ordered:'y') ;
   h.definekey  ("Cat1") ;
   h.definedata ("Cat1","n","freq1","freq2") ;
   h.definedone () ;
end;
call missing(freq1,freq2);
do until(lr);
set a end=lr;
rc=h.add();
end;
lr=0;
do until(lr);
set b end=lr;
if h.find(key:Cat1) ne 0 then freq1=1;else freq1=sum(freq1,1);
h.replace();
if h.find(key:Cat2) ne 0 then freq2=1;else freq2=sum(freq2,1);
h.replace();
end;
h.output(dataset:'want');
stop;
run;

novinosrin
Tourmaline | Level 20

Hi @Luciferene Btw, welcome to the forum as it appears this is just your 1st thread. If there are many categories, that approach can be tweaked to do some gymnastics using arrays, however me being extremely lazy I haven't done that until needed. 

Luciferene
Obsidian | Level 7

Thanks a lot @novinosrin for the welcome and the help, this is what I was looking for!

 

Few (probably very basic) questions:

How would I go about reordering the rows? For example if I want them in B C D A, etc.

 

If I wanted to operate on something like SUM(n * freq1) / SUM(freq1), how would I do that? It doesn't seem to let me operate on the values like matrix.

 

If you can give me any guidance or point me in the direction to learn that would be great.

novinosrin
Tourmaline | Level 20

Can you please elaborate a bit more. As in, this SUM(n * freq1) / SUM(freq1), do you want to apply this on the output dataset? 

May be it would help if you show your desired using a sample like you did in your question to think through technically

 

And for reordering rows, please let know the logic for the order you want.

 

I am going for lunch(Chicago time) with my mates, and there might be delay in my responses. Kindly bear with me

Luciferene
Obsidian | Level 7

Yes I'm working with the output. So if the output comes out to be (Category, n , freq1, freq2 for columns):

A 0.86 5 2

B 1.25 1 3

C 1.01 1 1

D 1.35 1 2

 

I want SUM(n * freq1) / SUM(freq1) = (0.86 * 5 + 1.25 * 1 + ...) / (0.86 + 1.25 + ...). I know how to do this if it was say matrix, but my attempt at doing it for dataset keeps giving me errors.

 

For the reordering, when the output data orders everything alphabetically where as the categories are set in certain order.

For example if my Categories were initially laid out as:

 

Standard

Preferred

Declined

 

The output spits out in the order of:

 

Declined

Preferred

Standard

 

It's more or less for the summary report.

 

Of course, enjoy your lunch! Thanks for your prompt responses.

Luciferene
Obsidian | Level 7
Just adding on, I would ideally like the result to be added onto the output data.
For example:
A 0.86 5 2
B 1.25 1 3
C 1.01 1 1
D 1.35 1 2
Total, (blank), 8, 8
Weighted AVG, (blank), For freq 1, For freq 2

Where the weighted avg is SUM(n * freq1) / SUM(freq1). My apologise for messy alignment.

Thanks!
novinosrin
Tourmaline | Level 20

Can you clarify whether the following result is what you want that needs to be appended to the prev output which is in have?

 

data have;
input Cat1 $ n freq1 freq2;
cards;
A 0.86 5 2
B 1.25 1 3
C 1.01 1 1
D 1.35 1 2
;


proc sql;
create table want as
select SUM(n * freq1)/SUM(freq1)  as wgt_avg_for_freq1, SUM(n * freq2)/SUM(freq2) as wgt_avg_for_freq2,sum(freq1) as sum_freq1,sum(freq1) as sum_freq2
from have;
quit;
novinosrin
Tourmaline | Level 20

Thank you, but I am afraid that result won't align  correctly when we stack with the output from the previous. 

 

Or  now that you have both the results, can you stack and show the desired one to be precise. It also sounds to me that you might want a report from a prco tabulate or proc report

Luciferene
Obsidian | Level 7

I'm not exactly sure if I understood you correctly, but this is basically the final form that I desire:

A           0.86    5              2
B           1.25    1              3
C           1.01    1              1
D           1.35    1              2
Total               8              8
WA                  0.98875        1.1475

Where the 2nd column (n) is blank. Basically the final two rows are summaries.

 

novinosrin
Tourmaline | Level 20

Aah, you understood well. Thank you & brb

novinosrin
Tourmaline | Level 20
/*This WANT is the output from the previous, 
just recreating from demo purpose*/
data want;
input Cat1 $ n freq1 freq2;
cards;
A 0.86 5 2
B 1.25 1 3
C 1.01 1 1
D 1.35 1 2
;

/*Final want*/
proc sql;
create table final_want as
select * from want
union 
select 'Total' as Cat1,. as n, sum(freq1) as sum_freq1,sum(freq1) as sum_freq2 from want
union 
select 'WA' as Cat1,. as n,SUM(n * freq1)/SUM(freq1)  as freq1,SUM(n * freq2)/SUM(freq2) as freq2 from want;
quit;

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 17 replies
  • 7518 views
  • 2 likes
  • 2 in conversation