DATA Step, Macro, Functions and more

Counting

Reply
N/A
Posts: 0

Counting

My table:

ID__ X
4___1
4___2
4___2
5___1
7__ 2
7__ 2
7__ 3
.
.
.
I want it to look like this:

ID__ X__N1__N2__N3
4___1___1
4___2___1___1
4___2___1___1
5___1___2___1
7__ 2___2___2
7__ 2___2___2
7__ 3___2___2___1
.
.
.

In NX I want the number of ID:s where X have been found.
When I stopped above, X=1 had been found for ID:s 4 and 5, so N1=2.
X=2 in ID:s 4 and 7, so N2=2.
X=3 in ID 7, so N3=1.

I insist that the counting should be done in a data step.

I don't think I can use FIRST or LAST.

Can anyone help me?

Susan
Super Contributor
Super Contributor
Posts: 3,174

Re: Counting

Yes, it's possible with a DATA step, I would say - you want to do something of the following "like" process, but honestly, I'm still not 100% clear that I understand some of the derived observation values you've shown:

1) Sort your data.
2) Using PROC SQL or some other technique, derive a max value for X to a macro variable to use later with an ARRAY and getting a var range for N1-N&max
3) Using a DATA step, do the following:
3a) Read in your sorted file to create another file.
3b) Setup an array with a numeric variable set, N1-N&max.
3c) With a BY and using IF FIRST.ID and LAST.ID, increment by one the array using X as the array subscript.
3d) Assign a var OBSNUM = _N_ -- must use later to merge new vars back into your original file.
4) Merge your file from #3 above back onto your original file using OBSNUM as your BY variable - you will need to assign a var OBSNUM in your original file first, of course.


Scott Barry
SBBWorks, Inc.
N/A
Posts: 0

Re: Counting

In N(X) I want the number of ID:s(let's say individuals) where X have been found.

That must be easy to understand.

If there are a total of 10 ID:s, and ID=2 have e.g 754, or whatever, instances of X=3 and ID=5, have e.g 28, or whatever, instances of X=3, and the remaining 8 individuals have no occurrence of X=3, then N(3) should be 2, because in two different ID:s there were at least one occurrence of X=3.

I don't understand how FIRST.ID and LAST.ID could be of any help. Could you please explain.

Susan
Super Contributor
Super Contributor
Posts: 3,174

Re: Counting

Agreed - no need for the BY and FIRST/LAST processing, rather consider some technique for counting/summarizing (FREQ or SUMMARY maybe, or a DATA step, where there you would use FIRST. and LAST. to count occurences of N (not involving ID). So work through the challenge, one step at a time, writing a piece of code to address one portion, and then assembling the pieces, like a puzzle.

Scott Barry
SBBWorks, Inc.
PROC Star
Posts: 1,570

Re: Counting

This should do:
[pre]
data _null_;
retain NN:;
array VAL(3,7) _temporary_; * will store whether any crossing has been found;
array NN(3) ; * will sum the crossings for a X;
input ID X; * read data;
VAL(X,ID)=1; * ID/X crossing found;
NN(X)=0 ; * initialise NB of crossings found so far;
do IDNUM=1 to 7; * count NB of crossings found so far;
NN(X)+ val(X,IDNUM) ;
end;
put ID X NN1-NN7;
cards;
4 1
4 2
4 2
5 1
7 2
7 2
7 3
run;
[/pre]
It outputs:
[pre]
4 1 1 . . . . . .
4 2 1 1 . . . . .
4 2 1 1 . . . . .
5 1 2 1 . . . . .
7 2 2 2 . . . . .
7 2 2 2 . . . . .
7 3 2 2 1 . . . .
Ask a Question
Discussion stats
  • 4 replies
  • 149 views
  • 0 likes
  • 3 in conversation