DATA Step, Macro, Functions and more

Count total IDS and distinct Values and put in two count variable Prefer not Proc SQL

Accepted Solution Solved
Reply
Contributor
Posts: 25
Accepted Solution

Count total IDS and distinct Values and put in two count variable Prefer not Proc SQL

 

 ID   var1   

1     x            

1     y

1     z

1     x

2     z

2     x

3     h

3     h

3      a

 

Ideally, I want it to be like either

 

 ID   var1   var1_count        count_of_x

1     x            4               2

1     y            4               2

1     z            4               2

1     x            4               2

2     z            2               1

2     x            2               1

3     h           3                1

3     x           3                1

3     a           3                1

 

or this form will also work

 

ID    var1_count        count_of_x

1              4                   2

 

2              2                   1

3              3                   1

 


Accepted Solutions
Solution
‎03-14-2016 03:09 PM
Super User
Posts: 5,429

Re: Count total IDS and distinct Values and put in two count variable Prefer not Proc SQL

@FreelanceReinhard 😃
@kksss:
Select id, count (*) as count, sum(case var1 when 'x' then 1 else 0 end) as count_X
From have
Group by id;

No 1 of my guide when to chose data step or SQL:
Are my task row (data step) or column (SQL) oriented?

I'm categorising this task as column oriented.

Cheers
😆
Data never sleeps

View solution in original post


All Replies
Trusted Advisor
Posts: 1,117

Re: Count total IDS and distinct Values and put in two count variable Prefer not Proc SQL

Try this:

/* Create test data */
data have;
input ID var1 $;
cards;
1 x 
1 y
1 z
1 x
2 z
2 x
3 h
3 x
3 a
;

/* Long form */
data want;
do until(last.id);
  set have;
  by id;
  var1_count=sum(var1_count, 1); 
  count_of_x=sum(count_of_x, var1='x');
end;
do until(last.id);
  set have;
  by id;
  output;
end;
run;

proc print data=want noobs;
run;

/* Aggregated form */
data want2;
do until(last.id);
  set have;
  by id;
  var1_count=sum(var1_count, 1); 
  count_of_x=sum(count_of_x, var1='x');
end;
drop var1;
run;

proc print data=want2 noobs;
run;
Contributor
Posts: 25

Re: Count total IDS and distinct Values and put in two count variable Prefer not Proc SQL

Posted in reply to FreelanceReinhard

your count_of_x doesn't work. it's showing me

 

id   var1    count of x

1      x            1 

1      y             2

2      z              3

2      x             4

Trusted Advisor
Posts: 1,117

Re: Count total IDS and distinct Values and put in two count variable Prefer not Proc SQL

Hi @kksss,

 

If you're talking about my suggested solution, I cannot replicate your finding. My data step for the "long form" results in:

              var1_    count_
ID    var1    count     of_x

 1     x        2         1
 1     y        2         1
 2     z        2         1
 2     x        2         1

as it should.

Contributor
Posts: 25

Re: Count total IDS and distinct Values and put in two count variable Prefer not Proc SQL

I don't know what i did wrong but it's simply just not working like you pasted 

 

the var count is working fine, but the count of x is not. 

it's not scaning through x, but indexing the rows by id. 

 

I have attached a screenshot of my out put. 

 

here is my code regarding to those output

 

data test3;
do until(last.student_id);
set test1;
by student_id;
major_count=sum(major_count, 1);
CSBS_count=sum(CSBS_count, Major_College="Social & Beh Science");
end;
do until(last.student_id);
set test1;
by student_id;
output;
end;
run;


Capture.PNG
Contributor
Posts: 25

Re: Count total IDS and distinct Values and put in two count variable Prefer not Proc SQL

the proc sql works fine however, so i guess I can live with that.

Trusted Advisor
Posts: 1,117

Re: Count total IDS and distinct Values and put in two count variable Prefer not Proc SQL

[ Edited ]

I've tried to replicate your issue using the below test data:

data test1;
input student_id major_college $50.;
cards;
1 Social & Beh Science
1 Nondegree Seeking
1 Health
2 Social & Beh Science
2 Social & Beh Science
;

The result of your data step is:

student_                            major_    CSBS_
   id       major_college            count    count

    1       Social & Beh Science       3        1
    1       Nondegree Seeking          3        1
    1       Health                     3        1
    2       Social & Beh Science       2        2
    2       Social & Beh Science       2        2

which seems to be correct. Do you also obtain the above result with my test data?

 

If so, we should take a closer look at your dataset TEST1.

Super User
Posts: 5,429

Re: Count total IDS and distinct Values and put in two count variable Prefer not Proc SQL

Always puzzled by requests like "...not using SQL". Shouldn't we use the most appropriate tool?
I think SQL in many situations are easier to code, for others to understand, in case of external RDBMS it will automatically being pushed to the source, and in SAS can be executed with multi threading (wich the data step won't).
Data never sleeps
Contributor
Posts: 25

Re: Count total IDS and distinct Values and put in two count variable Prefer not Proc SQL

well... then throw me how it can be done with proc sql, and I'll see if I like it...

My reason is because, I have to pass down my code to colleagues who doesn't use sql much... so it's easier for them to understand what is going on. 

Trusted Advisor
Posts: 1,117

Re: Count total IDS and distinct Values and put in two count variable Prefer not Proc SQL

@LinusH: I remember the "dark ages" when I would have preferred a non-SQL solution, too, as I would have felt less confident to adapt and maintain such "advanced" code.

Solution
‎03-14-2016 03:09 PM
Super User
Posts: 5,429

Re: Count total IDS and distinct Values and put in two count variable Prefer not Proc SQL

@FreelanceReinhard 😃
@kksss:
Select id, count (*) as count, sum(case var1 when 'x' then 1 else 0 end) as count_X
From have
Group by id;

No 1 of my guide when to chose data step or SQL:
Are my task row (data step) or column (SQL) oriented?

I'm categorising this task as column oriented.

Cheers
😆
Data never sleeps
Trusted Advisor
Posts: 1,117

Re: Count total IDS and distinct Values and put in two count variable Prefer not Proc SQL

For the aggregated output dataset, of course, that's fine (I would have shortened the definition of count_X to sum(var1='x'), though). However, to obtain the result in "long form" with PROC SQL alone, would require additional measures to ensure that the order of the observations within one ID is preserved (if this was important).

Super User
Posts: 5,429

Re: Count total IDS and distinct Values and put in two count variable Prefer not Proc SQL

The long form violates the 2nd normal form. If one is to mix aggregates and details, that should be done in a report - not in a permanent data store.

The aggregation of boolean expression is quite neat and have a shorter syntax, but it's SAS specific, and maybe not that as intuitively understood.
Data never sleeps
Trusted Advisor
Posts: 1,117

Re: Count total IDS and distinct Values and put in two count variable Prefer not Proc SQL

@LinusH: Fully agreed.

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 13 replies
  • 351 views
  • 5 likes
  • 3 in conversation