BookmarkSubscribeRSS Feed
RobPratt
SAS Super FREQ

Attached is vmatch2.sas, which contains the necessary modifications to run OPTNET instead of ASSIGN.

Giamma14
Fluorite | Level 6

Thanks again,

I will be able to try when the installation is over. With Proc Assign I got errors when the distance for a combination of pair - control was set to missing (due to the control not being allowed for that particolar case). Does Proc Optnet allow for missing?

Thanks,

Gianmario

RobPratt
SAS Super FREQ

Both PROC ASSIGN and PROC OPTNET allow missing values to represent ineligibility.  What errors did you get?

Giamma14
Fluorite | Level 6

Hi Rob,

here it is:

WARNING: The datum in observation 1 for variable control3693 has a nonzero fractional part after scaling by 1000.

WARNING: The datum in observation 1 for variable control4681 has a nonzero fractional part after scaling by 1000.

WARNING: The datum in observation 1 for variable control7565 has a nonzero fractional part after scaling by 1000.

WARNING: The datum in observation 1 for variable control10922 has a nonzero fractional part after scaling by 1000.

WARNING: The datum in observation 2 for variable control4749 has a nonzero fractional part after scaling by 1000.

WARNING: The datum in observation 2 for variable control6059 has a nonzero fractional part after scaling by 1000.

WARNING: The datum in observation 2 for variable control11536 has a nonzero fractional part after scaling by 1000.

WARNING: The datum in observation 3 for variable control10249 has a nonzero fractional part after scaling by 1000.

WARNING: The datum in observation 3 for variable control12075 has a nonzero fractional part after scaling by 1000.

WARNING: The datum in observation 3 for variable control12332 has a nonzero fractional part after scaling by 1000.

WARNING: The datum in observation 4 for variable control6950 has a nonzero fractional part after scaling by 1000.

WARNING: The datum in observation 4 for variable control11389 has a nonzero fractional part after scaling by 1000.

WARNING: The datum in observation 5 for variable control1105 has a nonzero fractional part after scaling by 1000.

WARNING: The datum in observation 5 for variable control1825 has a nonzero fractional part after scaling by 1000.

WARNING: The datum in observation 5 for variable control2726 has a nonzero fractional part after scaling by 1000.

WARNING: The datum in observation 5 for variable control3345 has a nonzero fractional part after scaling by 1000.

WARNING: The datum in observation 5 for variable control3692 has a nonzero fractional part after scaling by 1000.

WARNING: The datum in observation 5 for variable control3955 has a nonzero fractional part after scaling by 1000.

WARNING: The datum in observation 5 for variable control6351 has a nonzero fractional part after scaling by 1000.

WARNING: The datum in observation 5 for variable control6367 has a nonzero fractional part after scaling by 1000.

WARNING: More than 20 scaling errors or warnings have been found. Additional reporting of these is suppressed.

ERROR: The problem is infeasible. The output data set shows the assignments made before the infeasibility was found.

NOTE: There were 3069 observations read from the data set WORK.RESULT_0.

NOTE: The data set WORK.RESULT_1 has 3069 observations and 12577 variables.

NOTE: Compressing data set WORK.RESULT_1 increased size by 0.03 percent.

      Compressed is 3082 pages; un-compressed would require 3081 pages.

NOTE: PROCEDURE ASSIGN used (Total process time):

      real time           3:52.04

  cpu time        3:51.97

The results looks fine, but still I'm not 100% convinced...

I can post the data as well (everything is coded with a numeric IF, therefore there is no real info, it's a matrix 3,000 by 12,0000).

Thanks again,

Gianmario

RobPratt
SAS Super FREQ

Yes. please post the data.

Giamma14
Fluorite | Level 6

Thanks a lot, really appreciated!

If I may ask, would you mind testing the macro with optnet and maximum 4 controls per case? I'm chasing my IT guys but this will not be possible before two weeks (after the deadline for what I'm doing...)

RobPratt
SAS Super FREQ

Probably too big.  Try zipping it first.

Giamma14
Fluorite | Level 6

Nothing, it says queuing for virus scan and then disappear. I'm trying with a small dataset not zipped.

Giamma14
Fluorite | Level 6

Here it is, before being transposed so it's small.

This is the piece of code to get the data in the correct structure:

proc transpose data = complete out = tr (drop = _name_);

by key;

var dist;

id control;

run;

Again many thanks!

RobPratt
SAS Super FREQ

OK, I have the data now.  What is the %vmatch call you used?

Giamma14
Fluorite | Level 6

I cannot use the vmatch provided by you because it has the proc optnet and I'm still on 9.3 M1.

I've used the proc optmodel but it runs out of memory:

proc optmodel;

/* define sets for cases and controls */

set <str> CASES;

set <str> CONTROLS init (union{i in 1..12574} {"control"||i});

/* define cost parameter */

num cost{CASES, CONTROLS};

/* read in data including missing values */

read data tr into CASES = [key] {j in CONTROLS} < cost[key, j] = col(j) >;

/* create a set of pairs which are possible, i.e. cost is not missing */

set <string, string> POSSIBLE = {i in CASES, j in CONTROLS: cost[i,j] >= 0};

/* we only need variables for possible pairs */

var Assign{CASES,CONTROLS} >= 0;

num maxCost = max{<i,j> in POSSIBLE} cost[i,j];

min z = sum {<i,j> in POSSIBLE} cost[i,j]*Assign[i,j] /*- sum {<i,j> in POSSIBLE} (maxCost+1)*Assign[i,j]*/;

*** Constraint that each case has to be assigned to one control ***;

con case_assign{i in CASES}: sum{<(i),j> in POSSIBLE}Assign[i,j] <= 4;

*** Constraint that each control can be used only once ***;

con control_assign{j in CONTROLS}: sum{<i,(j)> in POSSIBLE} Assign[i,j] <= 1;

/* solve the model */

solve with lp;

/* print the result */

*print Assign;

create data assign from [i j] = {i in CASES, j in CONTROLS: Assign[i,j] >= 0.5} cost;

quit;

I've tried the vmatch downloaded from Mayo but it runs 24 hours and I stopped:

%vmatch(dist = tr, idca = key, a = 1, b = 4, lilm = 12574, n = 3069, firstco = control3453, lastco = control1999

,print = n, out = result);

Finally I'm trying a simple proc assign, that with some modifications (deleting the controls picked,...) I'll repeat 4 times in order to get 4 controls but I got the error posted above:

proc assign data = tr out = result_1;

        cost &start -- &end;

        id key;

    run;

RobPratt
SAS Super FREQ

OK, I think the correct %vmatch should instead be:


%vmatch(dist=tr, idca=key, a=1,b=4, lilm=12574, n=3069, firstco=control1, lastco=control12574, print=n, out=result);


When I run this using PROC ASSIGN, I get the scaling warnings and an objective value of 349.192.


When I run this using PROC OPTNET, I get an objective value of 308.57120668:


NOTE: ------------------------------------------------------------------------------------------------
NOTE: Running OPTNET version 13.1.
NOTE: ------------------------------------------------------------------------------------------------
NOTE: The number of columns in the input matrix is 963.
NOTE: The number of rows in the input matrix is 12276.
NOTE: Data input used 0.70 (cpu: 0.70) seconds.
NOTE: ------------------------------------------------------------------------------------------------
NOTE: Processing the linear assignment problem.
NOTE: The linear assignment problem is infeasible (253 columns are unassigned).
NOTE: The minimum cost partial linear assignment is 308.57120668.
NOTE: Processing the linear assignment problem used 1.06 (cpu: 1.06) seconds.
NOTE: ------------------------------------------------------------------------------------------------
NOTE: The output data set contains a partial linear assignment.
NOTE: Data output used 0.23 (cpu: 0.24) seconds.
NOTE: ------------------------------------------------------------------------------------------------
NOTE: The data set WORK.__OUTT has 710 observations and 3 variables.
NOTE: PROCEDURE OPTNET used (Total process time):
      real time           2.05 seconds
      cpu time            2.05 seconds

Both solutions are attached.

Giamma14
Fluorite | Level 6

Thanks a million for running them.

I still have some doubt about the following:

1) when running the vmatch macro as suggested, it uses only the controls between the variables "firstco=control1, lastco=control12574 ", roughly 1200 controls. That's why in the final results only 710 cases match with at least a control, while when running a proc assignwith control3453 -- control1999 I get more than 2000 matches and even visually looking at the data it can be seen that much more than 700 cases have at least a control.

Can you please try to run the macro using proc optnet with firstco = control3453, lastco = control1999? When I tired with proc assign crashes

2) When you run the proc assign, do you get the error message as well besides the warning?

ERROR: The problem is infeasible. The output data set shows the assignments made before the infeasibility was found.

RobPratt
SAS Super FREQ

1. But control1 to control12574 is 12574 controls, and that covers all the columns except key.  When I run with firstco=control3453 and lastco-control1999, I get an error message:

ERROR: Starting variable after ending variable in data set.

Even if more than 700 cases have at least one control, you might not be able to match all of them, since each control can be used at most once.  For a tiny example, consider a 2 x 1 matrix of non-missing values (two cases and one control).  Although each case is eligible for the control, at most one can be matched.

2. Yes, when I call %vmatch using PROC ASSIGN, I do get the ERROR message about infeasibility.

Giamma14
Fluorite | Level 6

1) When using the proc transpose, the variables are not in the proper order anymore at least in my version of SAS. I've understood, it was my mistake, should have been firstco = control3693, lastco = control12570

Just to be sure would be worth trying the following:

proc contents data = tr out = var (keep = name varnum) noprint; run;

proc sort data = var; by varnum; run;

    data _null_; set var end = fine;

    if varnum = 2 then call symput ('start', cats(name));

    if fine then call symput ('end', cats(name));

    run;

And then using firstco = &start, lastco = &end

2) So you don't get an error when using proc optnet? That's good news!

Thanks again

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

Multiple Linear Regression in SAS

Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 31 replies
  • 3293 views
  • 0 likes
  • 5 in conversation