G'day,
I've a question -- I'm trying to find observations below a certain cutoff value across any four years for the same ID.
Here's the data below and what I tried. I realized it doesn't work. what i want is to have identify in a list that has ids which are below 90 for any four years (Yr), something like the table below. Any help you can give is much appreciated. Or is there a better way to do it in proc sql? Can this be done in the DATA step? Thank you!
id | pctvax2000 | pctvax2001 | pctvax2002 | pctvax2003 | pctvax2004 | pctvax2005 | pctLT90_2000 | pctLT90_2001 | pctLT90_2002 | pctLT90_2003 | pctLT90_2004 | pctLT90_2005 | any4 |
1234567 | 88 | 90 | 57 | 95 | 77 | 62 | 1 | 0 | 1 | 0 | 1 | 1 | 4 |
1234568 | 82 | 91 | 53 | 78 | 90 | 95 | 1 | 0 | 1 | 1 | 0 | 0 | 5 |
*--2017-18 ;
DATA vax;
INPUT id $ pctvax yr $;
CARDS;
1234567 88 2000
1234567 90 2001
1234565 57 2002
1234567 95 2003
1234567 77 2004
1234567 62 2005
1234568 82 2000
1234568 91 2001
1234568 53 2002
1234568 78 2003
1234568 90 2004
1234568 45 2005
;
RUN;
data vax1;
set vax;
if yr='2000' then do;
pctvax2000=pctvax;
if pctvax2000<90 then pctLT90_2000=1;
end;
if yr='2001' then do;
pctvax2001=pctvax;
if pctvax2001<90 then pctLT90_2001=1;
end;
if yr='2002' then do;
pctvax2002=pctvax;
if pctvax2002<90 then pctLT90_2002=1;
end;
if yr='2003' then do;
pctvax2003=pctvax;
if pctvax2003<90 then pctLT90_2003=1;
end;
if yr='2004' then do;
pctvax2004=pctvax;
if pctvax2004<90 then pctLT90_2004=1;
end;
if yr='2005' then do;
pctvax2005=pctvax;
if pctvax2005<90 then pctLT90_2005=1;
end;
if sum(pctLT90_2000, pctLT90_2001, pctLT90_2002, pctLT90_2003, pctLT90_2004, pctLT90_2005)=4 then any4=1;
else any4=0;
run;
You say any of the 4 years but you're looking for all 4 years. Is the criteria, any of the 4 years is less than 90% or all of the four years less than 90%?
Two different approaches below - SQL to generate a single list of IDs or transpose and data step to generate a table more similar to what you have below.
proc sql;
create listID_lt90 as
select distinct ID
from vax where pctvax <90;
quit;
proc transpose data=vax out=vax_wide prefix=pctvax;
by id;
id year;
var pctvax;
run;
data want;
set vax_wide;
if min(of pctvax:) < 90 then any4 = 1;
else any4=0;
run;
Also, you may be able to modify our code (untested) by changing this criteria:
if sum(pctLT90_2000, pctLT90_2001, pctLT90_2002, pctLT90_2003, pctLT90_2004, pctLT90_2005)=4 then any4=1;
else any4=0;
if sum(pctLT90_2000, pctLT90_2001, pctLT90_2002, pctLT90_2003, pctLT90_2004, pctLT90_2005)>0 then any4=1;
else any4=0;
@jcis7 wrote:
G'day,
I've a question -- I'm trying to find observations below a certain cutoff value across any four years for the same ID.
Here's the data below and what I tried. I realized it doesn't work. what i want is to have identify in a list that has ids which are below 90 for any four years (Yr), something like the table below. Any help you can give is much appreciated. Or is there a better way to do it in proc sql? Can this be done in the DATA step? Thank you!
id pctvax2000 pctvax2001 pctvax2002 pctvax2003 pctvax2004 pctvax2005 pctLT90_2000 pctLT90_2001 pctLT90_2002 pctLT90_2003 pctLT90_2004 pctLT90_2005 any4 1234567 88 90 57 95 77 62 1 0 1 0 1 1 4 1234568 82 91 53 78 90 95 1 0 1 1 0 0 5
*--2017-18 ; DATA vax; INPUT id $ pctvax yr $; CARDS; 1234567 88 2000 1234567 90 2001 1234565 57 2002 1234567 95 2003 1234567 77 2004 1234567 62 2005 1234568 82 2000 1234568 91 2001 1234568 53 2002 1234568 78 2003 1234568 90 2004 1234568 45 2005 ; RUN; data vax1; set vax; if yr='2000' then do; pctvax2000=pctvax; if pctvax2000<90 then pctLT90_2000=1; end; if yr='2001' then do; pctvax2001=pctvax; if pctvax2001<90 then pctLT90_2001=1; end; if yr='2002' then do; pctvax2002=pctvax; if pctvax2002<90 then pctLT90_2002=1; end; if yr='2003' then do; pctvax2003=pctvax; if pctvax2003<90 then pctLT90_2003=1; end; if yr='2004' then do; pctvax2004=pctvax; if pctvax2004<90 then pctLT90_2004=1; end; if yr='2005' then do; pctvax2005=pctvax; if pctvax2005<90 then pctLT90_2005=1; end; if sum(pctLT90_2000, pctLT90_2001, pctLT90_2002, pctLT90_2003, pctLT90_2004, pctLT90_2005)=4 then any4=1; else any4=0; run;
You say any of the 4 years but you're looking for all 4 years. Is the criteria, any of the 4 years is less than 90% or all of the four years less than 90%?
Two different approaches below - SQL to generate a single list of IDs or transpose and data step to generate a table more similar to what you have below.
proc sql;
create listID_lt90 as
select distinct ID
from vax where pctvax <90;
quit;
proc transpose data=vax out=vax_wide prefix=pctvax;
by id;
id year;
var pctvax;
run;
data want;
set vax_wide;
if min(of pctvax:) < 90 then any4 = 1;
else any4=0;
run;
Also, you may be able to modify our code (untested) by changing this criteria:
if sum(pctLT90_2000, pctLT90_2001, pctLT90_2002, pctLT90_2003, pctLT90_2004, pctLT90_2005)=4 then any4=1;
else any4=0;
if sum(pctLT90_2000, pctLT90_2001, pctLT90_2002, pctLT90_2003, pctLT90_2004, pctLT90_2005)>0 then any4=1;
else any4=0;
@jcis7 wrote:
G'day,
I've a question -- I'm trying to find observations below a certain cutoff value across any four years for the same ID.
Here's the data below and what I tried. I realized it doesn't work. what i want is to have identify in a list that has ids which are below 90 for any four years (Yr), something like the table below. Any help you can give is much appreciated. Or is there a better way to do it in proc sql? Can this be done in the DATA step? Thank you!
id pctvax2000 pctvax2001 pctvax2002 pctvax2003 pctvax2004 pctvax2005 pctLT90_2000 pctLT90_2001 pctLT90_2002 pctLT90_2003 pctLT90_2004 pctLT90_2005 any4 1234567 88 90 57 95 77 62 1 0 1 0 1 1 4 1234568 82 91 53 78 90 95 1 0 1 1 0 0 5
*--2017-18 ; DATA vax; INPUT id $ pctvax yr $; CARDS; 1234567 88 2000 1234567 90 2001 1234565 57 2002 1234567 95 2003 1234567 77 2004 1234567 62 2005 1234568 82 2000 1234568 91 2001 1234568 53 2002 1234568 78 2003 1234568 90 2004 1234568 45 2005 ; RUN; data vax1; set vax; if yr='2000' then do; pctvax2000=pctvax; if pctvax2000<90 then pctLT90_2000=1; end; if yr='2001' then do; pctvax2001=pctvax; if pctvax2001<90 then pctLT90_2001=1; end; if yr='2002' then do; pctvax2002=pctvax; if pctvax2002<90 then pctLT90_2002=1; end; if yr='2003' then do; pctvax2003=pctvax; if pctvax2003<90 then pctLT90_2003=1; end; if yr='2004' then do; pctvax2004=pctvax; if pctvax2004<90 then pctLT90_2004=1; end; if yr='2005' then do; pctvax2005=pctvax; if pctvax2005<90 then pctLT90_2005=1; end; if sum(pctLT90_2000, pctLT90_2001, pctLT90_2002, pctLT90_2003, pctLT90_2004, pctLT90_2005)=4 then any4=1; else any4=0; run;
Work with VAX which is in a good format to provide the information you want. VAX1 isn't as good.
proc sql;
create table want as select
id,sum(pctvax<90) as below90
from vax group by id;
quit;
@jcis7 wrote:
Thanks - I'm looking for school year which is in the results of the code you posted. I'm looking for the school year across the top with the school code listed in the column where the current school year is listed in the results of the code you posted.
This is a poor data layout for data analysis, don't do this.
If you want this data layout in a report or to create an EXCEL spreadsheet, use PROC REPORT from the original data layout in data set VAX.
proc transpose data=vax out=vax_wide prefix=pctvax;
by id;
id year;
var pctvax;
run;
data vax_wide1;
set vax_wide;
if pctvax2000 < 90 then pctvax2000LT90=1;
if pctvax2001 < 90 then pctvax2001LT90=1;
if pctvax2002 < 90 then pctvax2002LT90=1;
if pctvax2003 < 90 then pctvax2003LT90=1;
if pctvax2004 < 90 then pctvax2004LT90=1;
if pctvax2005 < 90 then pctvax2005LT90=1;
if sum (pctvax2000LT90, pctvax2001LT90, pctvax2002LT90, pctvax2003LT90, pctvax2004LT90, pctvax2005LT90) =4 then any4=1;
else any4=0;
run;
This seems code above seems work for the simplified dataset I created. I'm running into a problem where in the actual dataset, it counts missing values as <90. I can post it as a separate post. Thank you everyone for your help!
proc transpose data=vax out=vax_wide prefix=pctvax;
by id;
id year;
var pctvax;
run;
data vax_wide1;
set vax_wide;
array pctvax(2000:2005) pctvax2000-pctvax2005;
array lt90vax(2000:2005) lt90vax2000-lt90vax2005;
*set all to 0;
do year=2000 to 2005;
lt90vax(year) = 0;
end;
*set to 1 if applicable;
do year=2000 to 2005;
if not missing(pctvax(year)) and pctvax(year) < 90 then lt90vax(year) = 1;
end;
if sum of lt90vax(*)) =4 then any4=1;
else any4=0;
run;
It's missing the opening bracket...I think you should be able to figure where that should go for the sum function 🙂
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.