Solved: Re: Number observations longitudinal dataset by id

MFraga · Posted 11-15-2018 11:50 AM

Hello,

I would like to use "proc freq" for descriptive analysis. My data is organized longitudinally so I would like to have the number of observations for "var1" by "id". Here an example of my data:

data have;
input

id time1 event1 weight var1;
datalines;
1 0 0 0.89595 0
1 1 0 0.89595 0
1 2 0 0.89595 0
1 3 0 0.89595 0
1 4 0 0.89595 0
1 5 0 0.89595 0
1 6 0 0.89595 0
1 7 0 0.89595 0
1 8 0 0.89595 0
1 9 0 0.89595 0
1 10 0 0.89595 0
1 11 0 0.89595 0
1 12 0 0.89595 0
1 13 0 0.89595 0
2 0 0 1.124591 1
2 1 1 1.124591 1
2 2 . 1.124591 1
3 0 0 1.15698 1
3 1 0 1.15698 1
3 2 1 1.15698 1
3 3 . 1.15698 1
4 0 1 0.987654 2
4 1 . 0.987654 2
4 2 . 0.987654 2
4 3 . 0.987654 2
4 4 . 0.987654 2
5 0 0 1.13999 2
6 0 0 1.0503 0
6 1 0 1.0503 0
6 2 0 1.0503 0
6 3 0 1.0503 0
6 4 0 1.0503 0
6 5 1 1.0503 0
6 6 . 1.0503 0
6 7 . 1.0503 0
6 8 . 1.0503 0

;

run;

The table that I want would give me the number of observations for variable "var1". It would be 2 observations for the level "0", and 2 observations for the level and 2 observations for the level "2" of the variable "var1". Any idea ?

Reeza · Posted 11-15-2018 02:00 PM

proc sql;
create table want as
select var1, count(distinct ID) as N_ID
from have
group by var1;
quit;

Use SQL to do this then. See above.

View solution in original post

Reeza · Posted 11-15-2018 11:52 AM

Did you try anything so far?

MFraga · Posted 11-15-2018 12:09 PM

Yes, I have tried a data step and I have the good result. This is what I have coded that gives me the good answer:

data want;

set have;

by id;

if first.id then number_obs=1;

else number_obs+1;

if last.id then output;

run;

proc freq data=want;

tables number_obs;

run;

The product is a table with the number of observations that each "id" has (how many times data has been collected for this person) and in the last line of the accumulated frequencies I do know how many observations I really have for each "id". The inconvenient is that I have to use a data step that creates a wide table. I was wondering if there is something more straightforward with "proc freq" (something to use with the "have" table).

Reeza · Posted 11-15-2018 12:19 PM

The data step does not create a wide table. It only counts the number of observations by ID. And this doesn't use VAR1 which you indicated in your original question so now I'm really confused as to what you're trying to do here, especially if this is the result you want.

MFraga · Posted 11-15-2018 12:21 PM

Sorry not to be clear. I will redo the question.

ballardw · Posted 11-15-2018 12:21 PM

@MFraga wrote:

Yes, I have tried a data step and I have the good result. This is what I have coded that gives me the good answer:

data want;

set have;

by id;

if first.id then number_obs=1;

else number_obs+1;

if last.id then output;

run;

proc freq data=want;

tables number_obs;

run;

The product is a table with the number of observations that each "id" has (how many times data has been collected for this person) and in the last line of the accumulated frequencies I do know how many observations I really have for each "id". The inconvenient is that I have to use a data step that creates a wide table. I was wondering if there is something more straightforward with "proc freq" (something to use with the "have" table).

It does not help when place a requirement such as "level 0" but do not mention which variable you are using per the example data.

proc freq data=have;
tables id*event1* var1
       id* event1 id*var1 
       event1*var1
      /list;
run;

provides a number of possible counts, there are others. The * operator creates a cross tabulation of the variables. The LIST option says to show the results on a single line

MFraga · Posted 11-15-2018 01:42 PM

Let me start again. I want to know how many observations I have for my categorical variable "var1". But I just want to count "id" once so I can know how many individuas are in my dataset for each category of "var1". What I produce with a data step is good and I have my answer doing this:

data want;

set have;

by id;

if first.id then number_obs=1;

else number_obs+1;

if last.id then output;

run;

proc freq data=want;

tables var1*number_obs;

run;

However, I was wondering if there is a way to produce an exit without using a data step first. An example of what I want to do:

proc freq data=have;

tables var1;

where first.id;

run;

I just need to know the number of individuals (how many different "IDs") I have in my dataset for each level of my categorical variable "var1". Is it clear now ?

Thanks for the help!

Reeza · Posted 11-15-2018 02:00 PM

proc sql;
create table want as
select var1, count(distinct ID) as N_ID
from have
group by var1;
quit;

Use SQL to do this then. See above.

MFraga · Posted 11-15-2018 02:09 PM

Thanks again!

MFraga · Posted 11-15-2018 01:52 PM

Thanks for the help. It is not the soluction that I am looking for. I think I did not make a clear question at the beginning.

ballardw · Posted 11-16-2018 11:27 AM

@MFraga wrote:
Thanks for the help. It is not the soluction that I am looking for. I think I did not make a clear question at the beginning.

One thing that helps is to take example data such as you provided and show the expected result as another data step program to show expected result if you want a data set as a result. Or show an example of a report table as built from the example data.

Even just clarifying whether you want report tables, read by people, or data sets is important.

Catch up on SAS Innovate 2026