BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
kksss
Calcite | Level 5

 

 ID   var1   

1     x            

1     y

1     z

1     x

2     z

2     x

3     h

3     h

3      a

 

Ideally, I want it to be like either

 

 ID   var1   var1_count        count_of_x

1     x            4               2

1     y            4               2

1     z            4               2

1     x            4               2

2     z            2               1

2     x            2               1

3     h           3                1

3     x           3                1

3     a           3                1

 

or this form will also work

 

ID    var1_count        count_of_x

1              4                   2

 

2              2                   1

3              3                   1

 

1 ACCEPTED SOLUTION

Accepted Solutions
LinusH
Tourmaline | Level 20
@FreelanceReinhard 😃
@kksss:
Select id, count (*) as count, sum(case var1 when 'x' then 1 else 0 end) as count_X
From have
Group by id;

No 1 of my guide when to chose data step or SQL:
Are my task row (data step) or column (SQL) oriented?

I'm categorising this task as column oriented.

Cheers
😆
Data never sleeps

View solution in original post

13 REPLIES 13
FreelanceReinh
Jade | Level 19

Try this:

/* Create test data */
data have;
input ID var1 $;
cards;
1 x 
1 y
1 z
1 x
2 z
2 x
3 h
3 x
3 a
;

/* Long form */
data want;
do until(last.id);
  set have;
  by id;
  var1_count=sum(var1_count, 1); 
  count_of_x=sum(count_of_x, var1='x');
end;
do until(last.id);
  set have;
  by id;
  output;
end;
run;

proc print data=want noobs;
run;

/* Aggregated form */
data want2;
do until(last.id);
  set have;
  by id;
  var1_count=sum(var1_count, 1); 
  count_of_x=sum(count_of_x, var1='x');
end;
drop var1;
run;

proc print data=want2 noobs;
run;
kksss
Calcite | Level 5

your count_of_x doesn't work. it's showing me

 

id   var1    count of x

1      x            1 

1      y             2

2      z              3

2      x             4

FreelanceReinh
Jade | Level 19

Hi @kksss,

 

If you're talking about my suggested solution, I cannot replicate your finding. My data step for the "long form" results in:

              var1_    count_
ID    var1    count     of_x

 1     x        2         1
 1     y        2         1
 2     z        2         1
 2     x        2         1

as it should.

kksss
Calcite | Level 5

I don't know what i did wrong but it's simply just not working like you pasted 

 

the var count is working fine, but the count of x is not. 

it's not scaning through x, but indexing the rows by id. 

 

I have attached a screenshot of my out put. 

 

here is my code regarding to those output

 

data test3;
do until(last.student_id);
set test1;
by student_id;
major_count=sum(major_count, 1);
CSBS_count=sum(CSBS_count, Major_College="Social & Beh Science");
end;
do until(last.student_id);
set test1;
by student_id;
output;
end;
run;


Capture.PNG
kksss
Calcite | Level 5

the proc sql works fine however, so i guess I can live with that.

FreelanceReinh
Jade | Level 19

I've tried to replicate your issue using the below test data:

data test1;
input student_id major_college $50.;
cards;
1 Social & Beh Science
1 Nondegree Seeking
1 Health
2 Social & Beh Science
2 Social & Beh Science
;

The result of your data step is:

student_                            major_    CSBS_
   id       major_college            count    count

    1       Social & Beh Science       3        1
    1       Nondegree Seeking          3        1
    1       Health                     3        1
    2       Social & Beh Science       2        2
    2       Social & Beh Science       2        2

which seems to be correct. Do you also obtain the above result with my test data?

 

If so, we should take a closer look at your dataset TEST1.

LinusH
Tourmaline | Level 20
Always puzzled by requests like "...not using SQL". Shouldn't we use the most appropriate tool?
I think SQL in many situations are easier to code, for others to understand, in case of external RDBMS it will automatically being pushed to the source, and in SAS can be executed with multi threading (wich the data step won't).
Data never sleeps
kksss
Calcite | Level 5

well... then throw me how it can be done with proc sql, and I'll see if I like it...

My reason is because, I have to pass down my code to colleagues who doesn't use sql much... so it's easier for them to understand what is going on. 

FreelanceReinh
Jade | Level 19

@LinusH: I remember the "dark ages" when I would have preferred a non-SQL solution, too, as I would have felt less confident to adapt and maintain such "advanced" code.

LinusH
Tourmaline | Level 20
@FreelanceReinhard 😃
@kksss:
Select id, count (*) as count, sum(case var1 when 'x' then 1 else 0 end) as count_X
From have
Group by id;

No 1 of my guide when to chose data step or SQL:
Are my task row (data step) or column (SQL) oriented?

I'm categorising this task as column oriented.

Cheers
😆
Data never sleeps
FreelanceReinh
Jade | Level 19

For the aggregated output dataset, of course, that's fine (I would have shortened the definition of count_X to sum(var1='x'), though). However, to obtain the result in "long form" with PROC SQL alone, would require additional measures to ensure that the order of the observations within one ID is preserved (if this was important).

LinusH
Tourmaline | Level 20
The long form violates the 2nd normal form. If one is to mix aggregates and details, that should be done in a report - not in a permanent data store.

The aggregation of boolean expression is quite neat and have a shorter syntax, but it's SAS specific, and maybe not that as intuitively understood.
Data never sleeps

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 13 replies
  • 2541 views
  • 5 likes
  • 3 in conversation