Solved: Use array to conditionally set values of merged data to missing

LEINAARE · Posted 03-09-2019 02:37 PM

Hello,

I have two datasets I want to merge. Both are very large (observations in the millions). Dataset1 contains Medicaid user information and a flag to indicate whether or not they are eligible for study inclusion (name of flag is "eligible"). Dataset2 contains Medicaid claims data. Dataset1 contains individuals who are eligible for study inclusion at any time during the 5-year duration of our study. I need to keep all observations in Dataset1, even if they are not eligible for study inclusion for a given observation (i.e. eligible=0). However, I only want the claims data from Dataset2 to merge with Dataset1 if the eligible = 1. Otherwise, I need the values in Dataset2 to be missing. Also, there is claims data in Dataset2 that is not associated with individuals in Dataset1. I want to only keep claims from Dataset2 that are associated with individuals in Dataset1, and only at times when their eligibility flag=1. Below is simplified version of my code to provide an example of what I am trying to do conceptually (the actual code is very long).

data merged_claims;
     merge Dataset1 (in=a)
          Dataset2 (in=b);
     by ID Date;
     if a=1;
     ClaimFlag=0;
     MedicalFlag=0;
     if b=1 and eligible=1 then do;
          ClaimFlag=1;
          MedicalFlag=1;
     end;
run;

I would like to include code that would set values of variables in Dataset2 to missing if ClaimFlag=0. Dataset2 has over 50 variables, both character and numeric. I could hard code it using a do statement and specify for each variable a missing value according to whether it is character or numeric, but it seems like an array would be much smarter to use for so many variables. Can anyone offer a suggestion how to use an array to set character and numeric values to missing wherever eligible=0?

Thanks

Reeza · Posted 03-09-2019 02:50 PM

You have to declare two arrays, one for the numerics and one for character and then you can use call missing. You can also use other variables lists if they apply.

Examples of variable lists can be found here:

https://blogs.sas.com/content/iml/2018/05/29/6-easy-ways-to-specify-a-list-of-variables-in-sas.html

In this example I'm using the start and end of a list of variables which will take all numeric variables between a start and end variable based on their order in the dataset.

array _num(*) startVar-numeric-endVar;
array _char(*) startVar-character-endVar;

if eligible = 0 then call missing(of _num(*), of _char(*));

@LEINAARE wrote:

Hello,

I have two datasets I want to merge. Both are very large (observations in the millions). Dataset1 contains Medicaid user information and a flag to indicate whether or not they are eligible for study inclusion (name of flag is "eligible"). Dataset2 contains Medicaid claims data. Dataset1 contains individuals who are eligible for study inclusion at any time during the 5-year duration of our study. I need to keep all observations in Dataset1, even if they are not eligible for study inclusion for a given observation (i.e. eligible=0). However, I only want the claims data from Dataset2 to merge with Dataset1 if the eligible = 1. Otherwise, I need the values in Dataset2 to be missing. Also, there is claims data in Dataset2 that is not associated with individuals in Dataset1. I want to only keep claims from Dataset2 that are associated with individuals in Dataset1, and only at times when their eligibility flag=1. Below is simplified version of my code to provide an example of what I am trying to do conceptually (the actual code is very long).
data merged_claims;
     merge Dataset1 (in=a)
          Dataset2 (in=b);
     by ID Date;
     if a=1;
     ClaimFlag=0;
     MedicalFlag=0;
     if b=1 and eligible=1 then do;
          ClaimFlag=1;
          MedicalFlag=1;
     end;
run;
I would like to include code that would set values of variables in Dataset2 to missing if ClaimFlag=0. Dataset2 has over 50 variables, both character and numeric. I could hard code it using a do statement and specify for each variable a missing value according to whether it is character or numeric, but it seems like an array would be much smarter to use for so many variables. Can anyone offer a suggestion how to use an array to set character and numeric values to missing wherever eligible=0?

Thanks

View solution in original post

Reeza · Posted 03-09-2019 02:50 PM

You have to declare two arrays, one for the numerics and one for character and then you can use call missing. You can also use other variables lists if they apply.

Examples of variable lists can be found here:

https://blogs.sas.com/content/iml/2018/05/29/6-easy-ways-to-specify-a-list-of-variables-in-sas.html

In this example I'm using the start and end of a list of variables which will take all numeric variables between a start and end variable based on their order in the dataset.

array _num(*) startVar-numeric-endVar;
array _char(*) startVar-character-endVar;

if eligible = 0 then call missing(of _num(*), of _char(*));

@LEINAARE wrote:

Hello,

I have two datasets I want to merge. Both are very large (observations in the millions). Dataset1 contains Medicaid user information and a flag to indicate whether or not they are eligible for study inclusion (name of flag is "eligible"). Dataset2 contains Medicaid claims data. Dataset1 contains individuals who are eligible for study inclusion at any time during the 5-year duration of our study. I need to keep all observations in Dataset1, even if they are not eligible for study inclusion for a given observation (i.e. eligible=0). However, I only want the claims data from Dataset2 to merge with Dataset1 if the eligible = 1. Otherwise, I need the values in Dataset2 to be missing. Also, there is claims data in Dataset2 that is not associated with individuals in Dataset1. I want to only keep claims from Dataset2 that are associated with individuals in Dataset1, and only at times when their eligibility flag=1. Below is simplified version of my code to provide an example of what I am trying to do conceptually (the actual code is very long).
data merged_claims;
     merge Dataset1 (in=a)
          Dataset2 (in=b);
     by ID Date;
     if a=1;
     ClaimFlag=0;
     MedicalFlag=0;
     if b=1 and eligible=1 then do;
          ClaimFlag=1;
          MedicalFlag=1;
     end;
run;
I would like to include code that would set values of variables in Dataset2 to missing if ClaimFlag=0. Dataset2 has over 50 variables, both character and numeric. I could hard code it using a do statement and specify for each variable a missing value according to whether it is character or numeric, but it seems like an array would be much smarter to use for so many variables. Can anyone offer a suggestion how to use an array to set character and numeric values to missing wherever eligible=0?

Thanks

LEINAARE · Posted 03-09-2019 03:05 PM

Hi @Reeza,

Thank you so much for your quick response. This is exactly what I was looking for.

Thanks,

Ted

Use array to conditionally set values of merged data to missing

Re: Use array to conditionally set values of merged data to missing

Re: Use array to conditionally set values of merged data to missing

Re: Use array to conditionally set values of merged data to missing

Use array to conditionally set values of merged data to missing

Re: Use array to conditionally set values of merged data to missing

Re: Use array to conditionally set values of merged data to missing

Re: Use array to conditionally set values of merged data to missing

The 2025 SAS Hackathon has begun!

SAS Training: Just a Click Away