I need to output percentages in SAS v9.4 that fall above a certain threshold from two categorical variables in a dataset - state and year. I first produced the following table that has all of the data:
proc freq data=example;
tables statefip * yr/ nopercent nocol norow missing out=test;
run;
Now I would like to calculate a percentage from year to year by state. There will be 10 years worth of data. For example, ratio1 = (Count of statefip variable from 2013)/(Count of statefip variable from 2012). If ratio1 is greater than a threshold (e.g., 80%), I would keep the data. If not, it would be coded as "missing".
Is that possible to do in SAS?
In addition, is there a way to do this in a macro?
Any suggestions is greatly appreciated!
Not sure if this solves your problem, but here's some code that dynamically divides all present years based on your previously stated criteria.
data have;
infile cards missover;
length state $ 100 _2012 _2013 _2014 _2015 8.;
input state _2012 _2013 _2014 _2015;
cards;
Alabama 9653 10264 9557 11152
Alaska 6273 6476 6423 6576
Arizona 24199 21635 22835 20865
;
run;
PROC SQL;
CREATE TABLE ALLYEARS AS
SELECT MONOTONIC() AS ROWID, * FROM DICTIONARY.COLUMNS WHERE MEMNAME = 'HAVE' AND NAME LIKE '%20%';
QUIT;
PROC SQL;
SELECT TRIM(ALLYEARS.NAME) INTO: ALLNAMES SEPARATED BY "," FROM ALLYEARS ;
QUIT;
PROC SQL;
SELECT TRIM(ALLYEARS.NAME)||"/"||TRIM(ALLYEARS2.NAME)||" AS "||TRIM(ALLYEARS.NAME)||"_"||TRIM(ALLYEARS2.NAME)||" "
INTO: ALLDIVS SEPARATED BY "," FROM ALLYEARS
, ALLYEARS AS ALLYEARS2
WHERE ALLYEARS.ROWID > ALLYEARS2.ROWID;
QUIT;
%PUT &ALLDIVS;
PROC SQL;
CREATE TABLE PROPORTION AS
SELECT state, &ALLNAMES, &ALLDIVS
FROM HAVE;
QUIT;
Depending on how your data were structured:
PROC SQL;
CREATE TABLE PROPORTION AS
SELECT STATENAME, SUM(DISTINCT CASE WHEN STATEFLIP2013 = 1 THEN 1 ELSE 0 END)/SUM(DISTINCT CASE WHEN STATEFLIP2012 = 1 THEN 1 ELSE 0 END)
FROM MYTABLE
GROUP BY STATENAME;
QUIT;
Apologies - I should have clarified with an illustrative table.
I currently am using proc tabulate to generate the following:
STATEFIP | 2012 | 2013 | 2014 | 2015 |
Alabama | 9,653 | 10,264 | 9,557 | 11,152 |
Alaska | 6,273 | 6,476 | 6,423 | 6,576 |
Arizona | 24,199 | 21,635 | 22,835 | 20,865 |
I would like to generate the following table:
STATEFIP | 2012 | 2013 | 2014 | 2015 | 2013/2012 | 2014/2013 | 2015/2014 |
Alabama | 9,653 | 10,264 | 9,557 | 11,152 | 106.33% | 93.11% | 116.69% |
Alaska | 6,273 | 6,476 | 6,423 | 6,576 | 103.24% | 99.18% | 102.38% |
Arizona | 24,199 | 21,635 | 22,835 | 20,865 | 89.40% | 105.55% | 91.37% |
Thanks for the specification!
My solution, for your review:
data have;
infile cards missover;
length state $ 100 _2012 _2013 _2014 _2015 8.;
input state _2012 _2013 _2014 _2015;
cards;
Alabama 9653 10264 9557 11152
Alaska 6273 6476 6423 6576
Arizona 24199 21635 22835 20865
;
run;
PROC SQL;
CREATE TABLE PROPORTION AS
SELECT state, _2012, _2013, _2014, _2015
, _2013/_2012 "2013/2012" FORMAT 8.4
, _2014/_2013 "2014/2013" FORMAT 8.4
, _2015/_2014 "2015/2014" FORMAT 8.4
FROM HAVE;
QUIT;
Patrick
The other option, if you want this dynamic would be to use PROC REPORT with a compute column.
PS. I'm moving this thread to reporting forum as it more pertains to reporting structure.
Thank you both for your timely suggestions!
I would like to do this dynamically for future reports. Here is the code that I have thus far for your comments and suggestions:
1) Creating a macro variable across years:
%LET CYR1 = %EVAL(2015); */For later iterations, one only have to update this year
%LET CYR2 = %EVAL(&CYR1-1);
%LET CYR3 = %EVAL(&CYR1-2);
%LET CYR4 = %EVAL(&CYR1-3);
2) I needed to concatenate and then subset two large datasets that will have new variables by the years:
DATA test;
SET dataset1 dataset2;
state=stnamel(statefip);
IF yr IN (&CYR1 &CYR2 &CYR3 &CYR4);
YR1 = yr IN (&CYR1); */This creates counts per year, but I also see that it creates a binary variable where YR1 can be 0 or 1, I need to figure out how to take out the values where YR1=0
YR2 = yr IN (&CYR2);
YR3 = yr IN (&CYR3);
YR4 = yr IN (&CYR4);
RUN;
3) Right now I am thinking about using Proc Tabulate to calculate the proportions one column at a time. The code below is in dev because I need to figure out how to take out values where YR1=0, YR2=0, YR3=0, YR4=0. Can I use PCTN in the proc tabulate below? I don't think I can use PCTNSUM.
PROC TABULATE DATA = test;
CLASS state;
VAR YR3 YR4;
TABLE state, YR3 YR4 YR3*PCTN<YR4>='Ratio 2013/2012'*f=8.2;
Where YR3='1' and YR4='1'; */Here is the logic but probably won't be a line of code
RUN;
I would like to successfully execute 3), then set a conditional statement that if any ratio calculated, (e.g., ratio 2013/2012) is lt 70%, then drop the value or convert it to missing "."
Thank you again.
I'm confused with the direction you're taking here.
Why are you using macro variables? You should be designing this to not need the years in the data at all.
My main challenge has been to create counts of each year for the states, then generating new variables that has the proportion of the counts for states. So far I have not found a way to do this in a data step instead of a proc means/freq/tabulate.
What does your original data look like to start off with? So far you've only shown the aggregate data.
Not sure if this solves your problem, but here's some code that dynamically divides all present years based on your previously stated criteria.
data have;
infile cards missover;
length state $ 100 _2012 _2013 _2014 _2015 8.;
input state _2012 _2013 _2014 _2015;
cards;
Alabama 9653 10264 9557 11152
Alaska 6273 6476 6423 6576
Arizona 24199 21635 22835 20865
;
run;
PROC SQL;
CREATE TABLE ALLYEARS AS
SELECT MONOTONIC() AS ROWID, * FROM DICTIONARY.COLUMNS WHERE MEMNAME = 'HAVE' AND NAME LIKE '%20%';
QUIT;
PROC SQL;
SELECT TRIM(ALLYEARS.NAME) INTO: ALLNAMES SEPARATED BY "," FROM ALLYEARS ;
QUIT;
PROC SQL;
SELECT TRIM(ALLYEARS.NAME)||"/"||TRIM(ALLYEARS2.NAME)||" AS "||TRIM(ALLYEARS.NAME)||"_"||TRIM(ALLYEARS2.NAME)||" "
INTO: ALLDIVS SEPARATED BY "," FROM ALLYEARS
, ALLYEARS AS ALLYEARS2
WHERE ALLYEARS.ROWID > ALLYEARS2.ROWID;
QUIT;
%PUT &ALLDIVS;
PROC SQL;
CREATE TABLE PROPORTION AS
SELECT state, &ALLNAMES, &ALLDIVS
FROM HAVE;
QUIT;
Thank you so much! This works.
The original datasets have two variables, statefip (the full name of the US state that will be converted into abbreviated two letter states) as well as yr as a 8 digit numeric variable, with values including 2012, 2013, 2014, 2015.
Below is the data that can be derived from these original variable via proc freq:
Can proc sql follow an output dataset generated by proc tabulate to calculate the proportions without using cards? Is this not possible because state is a character variable?
Probably not ideal, but I can also export this into Excel, calculate the proportions, and then import the Excel as a SAS dataset.
Cards and the first step is ONLY to generate a sample data set to work with. Replace HAVE with the name of your dataset.
Right, what Reeza said. Regardless of how you process your initial dataset (e.g. proc freq, with/without ODS output saved, or calculating them using proc sql, import via excel, etc.) then you can run this procedure on your resulting dataset.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.