I have a dataset where i'm trying to determine fallout/ outliers from a previous year. Here is an example of the table below:
Member | YR | A1 | A2 | A3 |
123 | 2019 | 1 | 0 | 1 |
123 | 2020 | 1 | 0 | 1 |
124 | 2019 | 1 | 1 | 0 |
124 | 2020 | 0 | 1 | 0 |
125 | 2019 | 0 | 1 | 1 |
125 | 2020 | 1 | 1 | 1 |
126 | 2019 | 0 | 1 | 1 |
126 | 2020 | 0 | 0 | 1 |
The A1-A3 are hierarchy based, A1 being most important and A3 being least. I'm trying to identify members and their corresponding A1 - A3 that didn't match from the previous year. So an expected output would be something like below:
Member | YR | A1 | A2 | A3 |
124 | 2020 | Missing | Found | NoMatch |
126 | 2020 | NoMatch | Missing | Found |
The goal is to find the A1-A3 variable assigned to a member that's missing, matched or has No Match on the previous year. In this case I'm looking at 2019 and matching it to 2020, so I only need 2020 rows/columns back.
I've tried building arrays and transposing the data with incld and excld columns for each year to find the missing or Nomatches and i cant seem to get the desired output. Any help would be appreciated.
Please explain in more detail the criteria that lead to "Missing","Found" or "NoMatch"
Okay, thanks, but you also said
The A1-A3 are hierarchy based, A1 being most important and A3 being least.
I don't see how that works into the logic here.
Where does 'found' come in?
Edit: nevermind, I see.
This way isn't terribly concise. Someone may have a better method, but it seems to match your output.
proc format;
value $matches
"1,1" = "Found"
"0,0" = "NoMatch"
"1,0" = "Missing"
"0,1" = "NoMatch";
run;
data want (drop = a: output_flag i);
merge have (in = a where = (yr = 2019) rename = (a1 = a1_19 a2 = a2_19 a3 = a3_19))
have (in = b where = (yr = 2020) rename = (a1 = a1_20 a2 = a2_20 a3 = a3_20));
by member;
_a1_1920 = put(catx(",", a1_19, a1_20), $matches.);
_a2_1920 = put(catx(",", a2_19, a2_20), $matches.);
_a3_1920 = put(catx(",", a3_19, a3_20), $matches.);
array checkmiss [*] _a:;
do i = 1 to dim(checkmiss);
if checkmiss[i] = "Missing" then output_flag = "1";
end;
if output_flag = "1" then output;
run;
Obs Member YR _a1_1920 _a2_1920 _a3_1920 1 124 2020 Missing Found NoMatch 2 126 2020 NoMatch Missing Found
data have;
infile datalines;
input member year A1 - A3;
datalines;
123 2019 1 0 1
123 2020 1 0 1
124 2019 1 1 0
124 2020 0 1 0
125 2019 0 1 1
125 2020 1 1 1
126 2019 0 1 1
126 2020 0 0 1
;;;;;;
run;
proc format;
value FMnM /*Found-Missing-noMatch*/
1="Missing" /*1,0*/
2="New?" /*0,1*/
3="Found" /*1,1*/
0="NoMatch" /*0,0*/
;
run;
proc sql;
create table want as
select this.member, this.year,
this.A1*2+last.A1 format=FMnM. as A1,
this.A2*2+last.A2 format=FMnM. as A2,
this.A3*2+last.A3 format=FMnM. as A3
from have as this, have as last
where this.year=last.year+1
AND this.Member=last.member;
quit;
@maguiremq this worked quite well. Thank you for this. I modified it slightly to include hierarchy. Thank you very much. Both you and another use provided solutions.
@bknitch wrote:
So for that I'm only concerned where if Missing is higher in the hierarchy. Meaning, if an A3 was populated in 2019 and not populated in 2020 but an A2 was populated in 2020 and not in 2019 I would exclude this/ or not look at this record. Hope that makes sense...
Sorry, no I still don't understand. What does "exclude" mean in the context of your earlier description of how to assign the "Missing" "Found" "NoMatch" to each cell?
But I think we have gone far enough down this path. If the code provided so far meets your need, fine, problem solved. If not, I am asking you to re-write the requirements from scratch, to address all of these issues, so that the explanation is clear and in one description of the logic, so we don't have to scroll up and down and re-read earlier comments to put it all together and understand.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.