BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Sean_OConnor
Obsidian | Level 7

Folks,

 

Given my lack of knowledge of utilising macros in SAS I'd appreciate some advice or at least guidance to some reading material. 

 

At present I've information on  over 1,000,000 individuals which each individual having over 400 variables. Within my dataset there is a variable called tot_aggregate. Furthermore, I've been given infromation as to 9 variables (x1...x400) which make up tot_aggregate. When I sum these to create my alt_tot_aggregate and compare them to the tot_aggregate variable I notice in circa 100,000 cases they do not match.

 

In other words there is some other variable which I am not accounting for in my estimation of alt_tot_aggregate. Thus, what I would like to do is the following. 

 

Create a macro which searches variable to variable to find a matching value and creates a new variable called missing_var_name which inserts the name of the missing variable within.

 

A quick illustration may provide better insight.

 

id x1 x2 x3.......x400  tot_aggregate  alt_tot_aggregate (x1, x45, x354) missing_var_name

1                                       600                  523                                                     x56

2                                       1500                1500                                                     .

3                                       230                  230                                                       .

4                                        563                  200                                                      x236

5                                         .

6            

 

 

I'm not even sure if such a thing is possible but would welcome any advice or guidance.

I'm aware that I'm making the assumption that the missing values are only due to one variable,

however, from eyeballing a few examples I'm confident that it's only one which is missing.                           .

1 ACCEPTED SOLUTION

Accepted Solutions
sbxkoenk
SAS Super FREQ

Hello,

 

in your code

tot_minus_alt_tot = 62.5;

 

would become:

tot_minus_alt_tot = ( tot_aggregate - alt_tot_aggregate ) ; 

 

That way tot_minus_alt_tot could be completely different for each individual. No manual intervention from your side required.

 

Cheers,

Koen

View solution in original post

6 REPLIES 6
sbxkoenk
SAS Super FREQ

Hello Sean_OConnor,

 

This little program may help you.

The VNAME function is what you want.

 

data abc(drop=i);
 LENGTH missing_var_name $ 32;
 set sashelp.class (rename=(age   =x1
                            height=x2
                            weight=x3));
 array xx(3) x1-x3;
 tot_minus_alt_tot = 62.5;
 do i = 1 to dim(xx);
  if xx(i) = tot_minus_alt_tot 
    then missing_var_name = vname(xx(i));
 end;
run;
/* end of program */
 

 

It would be possible to write a program that, per observation and for a given total, enumerates all possible sums of variables that lead to this total.

 

Cheers,

Koen

Brussels

 

Sean_OConnor
Obsidian | Level 7

Hi Koen,

 

Just a few quick queries on this piece of code. 

 

Is it the case that when I'm running this, that I'll have to change the 62.5 everytime I want to find a different value?

 

tot_minus_alt_tot = 62.5;

Given I've 100,000 indiviudals it is the case that tot_minus_alt_tot could be completly different for each individual.

 

Sean 

sbxkoenk
SAS Super FREQ

Hello,

 

in your code

tot_minus_alt_tot = 62.5;

 

would become:

tot_minus_alt_tot = ( tot_aggregate - alt_tot_aggregate ) ; 

 

That way tot_minus_alt_tot could be completely different for each individual. No manual intervention from your side required.

 

Cheers,

Koen

Astounding
PROC Star

Sean,

 

The good news:  you don't need to learn anything about macros to make this happen.

 

The bad news:  you do have to learn about arrays, and at an intermediate (not beginner) level.

 

Here's an example of the type of code that might appear in a DATA step:

 

length missing_var_name $ 32;

array x {400} x1-x400;

if tot_aggregate ne alt_tot_aggregate then do k=2 to 44, 46 to 353, 355 to 400;

   if tot_aggregate = alt_tot_aggregate + x{k} then do;

      missing_var_name = vname(x{k});

   end;

end;

 

Note that this may be an iterative process if you are missing 2 variables instead of one.  First locate one of the missing variables and add it in to ALT_TOT_AGGREGATE.  Then repeat the entire process to locate the other missing variable.       

 

Sean_OConnor
Obsidian | Level 7

A further query relating to the following excellent code. 

 

data abc(drop=i);
 LENGTH missing_var_name $ 32;
 set sashelp.class (rename=(age   =x1
                            height=x2
                            weight=x3));
 array xx(3) x1-x3;
 tot_minus_alt_tot = (total_income - alt_tot_income);
 do i = 1 to dim(xx);
  if xx(i) = tot_minus_alt_tot 
    then missing_var_name = vname(xx(i));
 end;
run;
/* end of program */

Initially, I made the assumption that my missing variable is due to only one observation. However, in some cases the same value may be in place in numerous different variables. 

 

Is it possible to amend the code above to create a second missing_var_name  third missing_var_name and so on, each time a match is found?

 

Furthermore, it could be the case that a number of variables aggregated together could create the missing observations, but I imagine it would be extremely difficult to extend the code to aggregate different combinations of variables to look for a match?

sbxkoenk
SAS Super FREQ

Hello,

 

Your 1st question:

Is it possible to amend the code above to create a second missing_var_name  third missing_var_name and so on, each time a match is found?

 

My answer:

Yes.

See below program (untested, but it should bring you the answer).

  • The LENGTH statement shows another length for missing_var_name ($ 1500 instead of $ 32).
  • missing_var_name is initialized on missing ('') for every observation.
  • Each time a match is detected, an additional variable name is added to missing_var_name and the separator with the previous variable name is an equality sign (=). No override anymore as in the previous program.
data abc(drop=i);
 LENGTH missing_var_name $ 1500;
 set sashelp.class (rename=(age   =x1
                            height=x2
                            weight=x3));
 array xx(3) x1-x3;
 missing_var_name='';
 tot_minus_alt_tot = (total_income - alt_tot_income);
 do i = 1 to dim(xx);
  if xx(i) = tot_minus_alt_tot 
    then missing_var_name = strip(missing_var_name)!!'='!!strip(vname(xx(i)));
 end;
run;
/* end of program */

 

Your 2nd question:

Furthermore, it could be the case that a number of variables aggregated together could create the missing observations, but I imagine it would be extremely difficult to extend the code to aggregate different combinations of variables to look for a match?

 

My answer:

If you have SAS/OR (SAS for Operations Research (mathematical programming)) I can extend the code so that you get all possible solutions / sums that result in a given total, the total being tot_minus_alt_tot = (total_income - alt_tot_income). It's not really 'extending the code', rather it is writing a completely different program in our algebraïc modelling language (PROC OPTMODEL) and then solving with our CLP solver (constraint logic programming (CLP) solver in the OPTMODEL procedure).

 

Good luck,

Koen

Brussels

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

Creating Custom Steps in SAS Studio

Check out this tutorial series to learn how to build your own steps in SAS Studio.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 3334 views
  • 1 like
  • 3 in conversation