BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
LEINAARE
Obsidian | Level 7

Hello,

 

I have two datasets I want to merge.  Both are very large (observations in the millions).  Dataset1 contains Medicaid user information and a flag to indicate whether or not they are eligible for study inclusion (name of flag is "eligible").  Dataset2 contains Medicaid claims data.  Dataset1 contains individuals who are eligible for study inclusion at any time during the 5-year duration of our study.  I need to keep all observations in Dataset1, even if they are not eligible for study inclusion for a given observation (i.e. eligible=0).  However, I only want the claims data from Dataset2 to merge with Dataset1 if the eligible = 1.  Otherwise, I need the values in Dataset2 to be missing.  Also, there is claims data in Dataset2 that is not associated with individuals in Dataset1.  I want to only keep claims from Dataset2 that are associated with individuals in Dataset1, and only at times when their eligibility flag=1.  Below is simplified version of my code to provide an example of what I am trying to do conceptually (the actual code is very long).

 

data merged_claims;
     merge Dataset1 (in=a)
          Dataset2 (in=b);
     by ID Date;
     if a=1;
     ClaimFlag=0;
     MedicalFlag=0;
     if b=1 and eligible=1 then do;
          ClaimFlag=1;
          MedicalFlag=1;
     end;
run;

I would like to include code that would set values of variables in Dataset2 to missing if ClaimFlag=0.  Dataset2 has over 50 variables, both character and numeric.  I could hard code it using a do statement and specify for each variable a missing value according to whether it is character or numeric, but it seems like an array would be much smarter to use for so many variables.  Can anyone offer a suggestion how to use an array to set character and numeric values to missing wherever eligible=0?

 

Thanks

1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User

You have to declare two arrays, one for the numerics and one for character and then you can use call missing. You can also use other variables lists if they apply. 

Examples of variable lists can be found here:

https://blogs.sas.com/content/iml/2018/05/29/6-easy-ways-to-specify-a-list-of-variables-in-sas.html

 

In this example I'm using the start and end of a list of variables which will take all numeric variables between a start and end variable based on their order in the dataset. 

 

array _num(*) startVar-numeric-endVar;
array _char(*) startVar-character-endVar;

if eligible = 0 then call missing(of _num(*), of _char(*));

@LEINAARE wrote:

Hello,

 

I have two datasets I want to merge.  Both are very large (observations in the millions).  Dataset1 contains Medicaid user information and a flag to indicate whether or not they are eligible for study inclusion (name of flag is "eligible").  Dataset2 contains Medicaid claims data.  Dataset1 contains individuals who are eligible for study inclusion at any time during the 5-year duration of our study.  I need to keep all observations in Dataset1, even if they are not eligible for study inclusion for a given observation (i.e. eligible=0).  However, I only want the claims data from Dataset2 to merge with Dataset1 if the eligible = 1.  Otherwise, I need the values in Dataset2 to be missing.  Also, there is claims data in Dataset2 that is not associated with individuals in Dataset1.  I want to only keep claims from Dataset2 that are associated with individuals in Dataset1, and only at times when their eligibility flag=1.  Below is simplified version of my code to provide an example of what I am trying to do conceptually (the actual code is very long).

 

data merged_claims;
     merge Dataset1 (in=a)
          Dataset2 (in=b);
     by ID Date;
     if a=1;
     ClaimFlag=0;
     MedicalFlag=0;
     if b=1 and eligible=1 then do;
          ClaimFlag=1;
          MedicalFlag=1;
     end;
run;

I would like to include code that would set values of variables in Dataset2 to missing if ClaimFlag=0.  Dataset2 has over 50 variables, both character and numeric.  I could hard code it using a do statement and specify for each variable a missing value according to whether it is character or numeric, but it seems like an array would be much smarter to use for so many variables.  Can anyone offer a suggestion how to use an array to set character and numeric values to missing wherever eligible=0?

 

Thanks


 

View solution in original post

2 REPLIES 2
Reeza
Super User

You have to declare two arrays, one for the numerics and one for character and then you can use call missing. You can also use other variables lists if they apply. 

Examples of variable lists can be found here:

https://blogs.sas.com/content/iml/2018/05/29/6-easy-ways-to-specify-a-list-of-variables-in-sas.html

 

In this example I'm using the start and end of a list of variables which will take all numeric variables between a start and end variable based on their order in the dataset. 

 

array _num(*) startVar-numeric-endVar;
array _char(*) startVar-character-endVar;

if eligible = 0 then call missing(of _num(*), of _char(*));

@LEINAARE wrote:

Hello,

 

I have two datasets I want to merge.  Both are very large (observations in the millions).  Dataset1 contains Medicaid user information and a flag to indicate whether or not they are eligible for study inclusion (name of flag is "eligible").  Dataset2 contains Medicaid claims data.  Dataset1 contains individuals who are eligible for study inclusion at any time during the 5-year duration of our study.  I need to keep all observations in Dataset1, even if they are not eligible for study inclusion for a given observation (i.e. eligible=0).  However, I only want the claims data from Dataset2 to merge with Dataset1 if the eligible = 1.  Otherwise, I need the values in Dataset2 to be missing.  Also, there is claims data in Dataset2 that is not associated with individuals in Dataset1.  I want to only keep claims from Dataset2 that are associated with individuals in Dataset1, and only at times when their eligibility flag=1.  Below is simplified version of my code to provide an example of what I am trying to do conceptually (the actual code is very long).

 

data merged_claims;
     merge Dataset1 (in=a)
          Dataset2 (in=b);
     by ID Date;
     if a=1;
     ClaimFlag=0;
     MedicalFlag=0;
     if b=1 and eligible=1 then do;
          ClaimFlag=1;
          MedicalFlag=1;
     end;
run;

I would like to include code that would set values of variables in Dataset2 to missing if ClaimFlag=0.  Dataset2 has over 50 variables, both character and numeric.  I could hard code it using a do statement and specify for each variable a missing value according to whether it is character or numeric, but it seems like an array would be much smarter to use for so many variables.  Can anyone offer a suggestion how to use an array to set character and numeric values to missing wherever eligible=0?

 

Thanks


 

LEINAARE
Obsidian | Level 7

Hi @Reeza,

 

Thank you so much for your quick response.  This is exactly what I was looking for.

 

Thanks,

 

Ted

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 2 replies
  • 788 views
  • 0 likes
  • 2 in conversation