My data is sorted by both year and personal id and there are plural observations per each year as well.
While some predictors are time-invariant, others are time-varying (but categorical).
Some are dead, so there are variables indicating dates of death, but it is only shown in the last year.
I should manage this data so that I can exclude some IDs by criteria (such as excluding people who didn't show any change in time-varying variables and I also have to count the number of the event if it happens.)
My data looks like this.
year id sex inc visit1 visit2 death
1999 1 1 3 1 . .
2000 1 1 3 1 . .
2000 1 1 2 1 . .
2001 1 1 2 . . .
2002 1 1 2 . . .
1999 2 2 3 1 1 .
1999 2 2 3 1 1 .
1999 2 2 3 1 1 .
2000 2 2 3 1 1 .
2001 2 2 2 1 1 200108
1999 3 2 3 . . .
2000 3 2 3 . . .
2001 3 2 3 . . .
2002 3 2 3 . . .
so I want to exclude id3 because the value in inc is constant and also want to calculate the number of visit1 and visit 2 for id 1 and id2 by year. I wonder how I can manipulate data. Should I stick to person-period format? Otherwise, should I transpose the data so that it becomes person-level data for data handling.
+addition
Thanks for the suggestion. Here is the data I want to get
year id sex inc visit1 visit2 death total_visit1 total_visit2
1999 1 1 3 1 . . 3 0
2000 1 1 3 1 . . 3 0
2000 1 1 2 1 . . 3 0
2001 1 1 2 . . . 3 0
2002 1 1 2 . . . 3 0
1999 2 2 3 1 1 . 5 5
1999 2 2 3 1 1 . 5 5
1999 2 2 3 1 1 . 5 5
2000 2 2 3 1 1 . 5 5
2001 2 2 2 1 1 200108 5 5
I eliminated id3 because there is no variation in inc. I want IDs that showed variation in inc. How can I eliminate those who don't meet the criteria?
Thanks in advance.
data have;
input year id sex inc visit1 visit2 death;
cards;
1999 1 1 3 1 . .
2000 1 1 3 1 . .
2000 1 1 2 1 . .
2001 1 1 2 . . .
2002 1 1 2 . . .
1999 2 2 3 1 1 .
1999 2 2 3 1 1 .
1999 2 2 3 1 1 .
2000 2 2 3 1 1 .
2001 2 2 2 1 1 200108
1999 3 2 3 . . .
2000 3 2 3 . . .
2001 3 2 3 . . .
2002 3 2 3 . . .
;
run;
proc sql;
create table want as
select *,coalesce(sum(visit1),0) as total_visit1 ,
coalesce(sum(visit2),0) as total_visit2
from have
group by id
having count(distinct inc) ne 1
order by id,year;
quit;
Your explanations are hard to follow.
Please show the expected output, and the logic used to achieve the values shown.
There is a good reason for @ChrisNZ to ask for a sample desired output. For instance, it is not clear if you want to generate one record per year (edited in "year") with total visit counts, or preserve multiple records per year, but including a total yearly visit1 and visit2 count.
However, as to your question, there is no apparent reason to reorganize the data layout to accommodate your requests.
data have;
input year id sex inc visit1 visit2 death;
cards;
1999 1 1 3 1 . .
2000 1 1 3 1 . .
2000 1 1 2 1 . .
2001 1 1 2 . . .
2002 1 1 2 . . .
1999 2 2 3 1 1 .
1999 2 2 3 1 1 .
1999 2 2 3 1 1 .
2000 2 2 3 1 1 .
2001 2 2 2 1 1 200108
1999 3 2 3 . . .
2000 3 2 3 . . .
2001 3 2 3 . . .
2002 3 2 3 . . .
;
run;
proc sql;
create table want as
select *,coalesce(sum(visit1),0) as total_visit1 ,
coalesce(sum(visit2),0) as total_visit2
from have
group by id
having count(distinct inc) ne 1
order by id,year;
quit;
Thank you so much. It really helps a lot.
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.