BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Sean_OConnor
Fluorite | Level 6

Folks,

 

Given my lack of knowledge of utilising macros in SAS I'd appreciate some advice or at least guidance to some reading material. 

 

At present I've information on  over 1,000,000 individuals which each individual having over 400 variables. Within my dataset there is a variable called tot_aggregate. Furthermore, I've been given infromation as to 9 variables (x1...x400) which make up tot_aggregate. When I sum these to create my alt_tot_aggregate and compare them to the tot_aggregate variable I notice in circa 100,000 cases they do not match.

 

In other words there is some other variable which I am not accounting for in my estimation of alt_tot_aggregate. Thus, what I would like to do is the following. 

 

Create a macro which searches variable to variable to find a matching value and creates a new variable called missing_var_name which inserts the name of the missing variable within.

 

A quick illustration may provide better insight.

 

id x1 x2 x3.......x400  tot_aggregate  alt_tot_aggregate (x1, x45, x354) missing_var_name

1                                       600                  523                                                     x56

2                                       1500                1500                                                     .

3                                       230                  230                                                       .

4                                        563                  200                                                      x236

5                                         .

6            

 

 

I'm not even sure if such a thing is possible but would welcome any advice or guidance.

I'm aware that I'm making the assumption that the missing values are only due to one variable,

however, from eyeballing a few examples I'm confident that it's only one which is missing.                           .

1 ACCEPTED SOLUTION

Accepted Solutions
sbxkoenk
SAS Super FREQ

Hello,

 

in your code

tot_minus_alt_tot = 62.5;

 

would become:

tot_minus_alt_tot = ( tot_aggregate - alt_tot_aggregate ) ; 

 

That way tot_minus_alt_tot could be completely different for each individual. No manual intervention from your side required.

 

Cheers,

Koen

View solution in original post

6 REPLIES 6
sbxkoenk
SAS Super FREQ

Hello Sean_OConnor,

 

This little program may help you.

The VNAME function is what you want.

 

data abc(drop=i);
 LENGTH missing_var_name $ 32;
 set sashelp.class (rename=(age   =x1
                            height=x2
                            weight=x3));
 array xx(3) x1-x3;
 tot_minus_alt_tot = 62.5;
 do i = 1 to dim(xx);
  if xx(i) = tot_minus_alt_tot 
    then missing_var_name = vname(xx(i));
 end;
run;
/* end of program */
 

 

It would be possible to write a program that, per observation and for a given total, enumerates all possible sums of variables that lead to this total.

 

Cheers,

Koen

Brussels

 

Sean_OConnor
Fluorite | Level 6

Hi Koen,

 

Just a few quick queries on this piece of code. 

 

Is it the case that when I'm running this, that I'll have to change the 62.5 everytime I want to find a different value?

 

tot_minus_alt_tot = 62.5;

Given I've 100,000 indiviudals it is the case that tot_minus_alt_tot could be completly different for each individual.

 

Sean 

sbxkoenk
SAS Super FREQ

Hello,

 

in your code

tot_minus_alt_tot = 62.5;

 

would become:

tot_minus_alt_tot = ( tot_aggregate - alt_tot_aggregate ) ; 

 

That way tot_minus_alt_tot could be completely different for each individual. No manual intervention from your side required.

 

Cheers,

Koen

Astounding
PROC Star

Sean,

 

The good news:  you don't need to learn anything about macros to make this happen.

 

The bad news:  you do have to learn about arrays, and at an intermediate (not beginner) level.

 

Here's an example of the type of code that might appear in a DATA step:

 

length missing_var_name $ 32;

array x {400} x1-x400;

if tot_aggregate ne alt_tot_aggregate then do k=2 to 44, 46 to 353, 355 to 400;

   if tot_aggregate = alt_tot_aggregate + x{k} then do;

      missing_var_name = vname(x{k});

   end;

end;

 

Note that this may be an iterative process if you are missing 2 variables instead of one.  First locate one of the missing variables and add it in to ALT_TOT_AGGREGATE.  Then repeat the entire process to locate the other missing variable.       

 

Sean_OConnor
Fluorite | Level 6

A further query relating to the following excellent code. 

 

data abc(drop=i);
 LENGTH missing_var_name $ 32;
 set sashelp.class (rename=(age   =x1
                            height=x2
                            weight=x3));
 array xx(3) x1-x3;
 tot_minus_alt_tot = (total_income - alt_tot_income);
 do i = 1 to dim(xx);
  if xx(i) = tot_minus_alt_tot 
    then missing_var_name = vname(xx(i));
 end;
run;
/* end of program */

Initially, I made the assumption that my missing variable is due to only one observation. However, in some cases the same value may be in place in numerous different variables. 

 

Is it possible to amend the code above to create a second missing_var_name  third missing_var_name and so on, each time a match is found?

 

Furthermore, it could be the case that a number of variables aggregated together could create the missing observations, but I imagine it would be extremely difficult to extend the code to aggregate different combinations of variables to look for a match?

sbxkoenk
SAS Super FREQ

Hello,

 

Your 1st question:

Is it possible to amend the code above to create a second missing_var_name  third missing_var_name and so on, each time a match is found?

 

My answer:

Yes.

See below program (untested, but it should bring you the answer).

  • The LENGTH statement shows another length for missing_var_name ($ 1500 instead of $ 32).
  • missing_var_name is initialized on missing ('') for every observation.
  • Each time a match is detected, an additional variable name is added to missing_var_name and the separator with the previous variable name is an equality sign (=). No override anymore as in the previous program.
data abc(drop=i);
 LENGTH missing_var_name $ 1500;
 set sashelp.class (rename=(age   =x1
                            height=x2
                            weight=x3));
 array xx(3) x1-x3;
 missing_var_name='';
 tot_minus_alt_tot = (total_income - alt_tot_income);
 do i = 1 to dim(xx);
  if xx(i) = tot_minus_alt_tot 
    then missing_var_name = strip(missing_var_name)!!'='!!strip(vname(xx(i)));
 end;
run;
/* end of program */

 

Your 2nd question:

Furthermore, it could be the case that a number of variables aggregated together could create the missing observations, but I imagine it would be extremely difficult to extend the code to aggregate different combinations of variables to look for a match?

 

My answer:

If you have SAS/OR (SAS for Operations Research (mathematical programming)) I can extend the code so that you get all possible solutions / sums that result in a given total, the total being tot_minus_alt_tot = (total_income - alt_tot_income). It's not really 'extending the code', rather it is writing a completely different program in our algebraïc modelling language (PROC OPTMODEL) and then solving with our CLP solver (constraint logic programming (CLP) solver in the OPTMODEL procedure).

 

Good luck,

Koen

Brussels

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

SAS Enterprise Guide vs. SAS Studio

What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 2052 views
  • 1 like
  • 3 in conversation