I have a longitudinal data set from 2013 to 2016 and I want to know how many people in the 2013 are not in 2014, 2015 and 2016 and how many people are new in 2014, 2015 and 2016. not sure what is the best way to do this. my data looks like this:
id year x1 x2
1 2013 1 2
1 2014 2 1
1 2015 1 0
1 2016 0 1
2 2015 0 1
2 2016 0 1
3 2014 1 0
3 2015 1 0
4 2013 0 1
4 2014 01
4 2016 01
;
Should just be a matter of proc freq:
proc sort data=have out=want; by year id; run; proc freq; tables year / output=num_id_by_year; run;
It really depends, can't tell what output you are looking for, show an example. You could take min/max for each by group - assumes of course there are no gaps that way. Or you could transpose the data to get years across the top, then proc summary that to get overall per year. Really just guessing what you want though.
The best way would give you all patterns that exist in the data. That way, you can even track if a person left and later returned. For example (and assuming your data is sorted by ID):
data want;
set have;
by id;
if first.id then pattern = 'NNNN';
retain pattern;
if year = 2013 then substr(pattern, 1, 1) = 'Y';
else if year = 2014 then substr(pattern, 2, 1) = 'Y';
else if year = 2015 then substr(pattern, 3, 1) = 'Y';
else if year = 2016 then substr(pattern, 4, 1) = 'Y';
if last.id;
run;
That gives you a four-character PATTERN variable for each ID that shows the pattern of participation over the years. To count them, use PROC FREQ:
proc freq data=want;
tables pattern;
run;
Thanks. this works. here is the code I used:
PROC SORT DATA=out2.Client_VL_Euci44_f
OUT=test1;
BY euci44 year;
RUN ;
data test;
set test1;
by euci44;
if first.euci then pattern = '3456';
if year = 2013 then substr(pattern, 1, 1) = '3';
else if year = 2014 then substr(pattern, 2, 1) = '4';
else if year = 2015 then substr(pattern, 3, 1) = '5';
else if year = 2016 then substr(pattern, 4, 1) = '6';
if last.euci44;
run;
proc freq data=test ;
table pattern;
run;
pattern | Frequency | Percent | Cumulative Frequency |
Cumulative Percent |
---|---|---|---|---|
6 | 310244 | 67.23 | 310244 | 67.23 |
5 | 71554 | 15.51 | 381798 | 82.73 |
4 | 46492 | 10.07 | 428290 | 92.81 |
3 | 33186 | 7.19 | 461476 | 100.00 |
Close, but not there yet. This code gives you the last year of participation. There are two things missing. First, this statement needs to appear in the DATA step:
retain pattern;
Second, the initial value for PATTERN cannot be "3456". Otherwise how can you tell when "3" means you put it there initially, or it was put there by the appearance of 2013 in the data? This would be an acceptable initial value for PATTERN:
if first.euci44 then pattern = '----';
That way, any digits that appear in the end would be the result of a matching value for YEAR in the data.
Without knowing what you actually want the output to look like it is hard to provide lots of details.
Here are a couple of ideas.
data have; input id year x1 x2; datalines; 1 2013 1 2 1 2014 2 1 1 2015 1 0 1 2016 0 1 2 2015 0 1 2 2016 0 1 3 2014 1 0 3 2015 1 0 4 2013 0 1 ; ; proc sql; create table missing2014 as select distinct id from have where year=2013 except select distinct id from have where year=2014 ; create table new2014 as select distinct id from have where year=2014 except select distinct id from have where year=2013 ; quit; proc format library=work; value present ., 0 = ' ' other= 'X' ; run; proc tabulate data=have; class id year; tables id, year*n=''*f=present.; run;
The Proc SQL could include multiple years for comparison by using: where year in (2014 2015 2016) for example to get ids that appeared in ANY of those years.
The tabulate provides a table indicating the years that each Id appears.
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.