The 5 and 7 denominators were chosen arbitrarily to correspond to the allowed mismatch range for both variables. You could use
euclid((c.age-s.age)/3, (c.edu-s.edu)/2) as distance
instead to scale both mismatches from -1 to +1 or any other scaling that reflects the relative importance of matching age and edu.
Following the same logic, you could add dobw as a matching variable in the distance calculation with
euclid((c.age-s.age)/3, (c.edu-s.edu)/2, (c.dobw-s.dobw)/14) as distance
with the extra join condition
c.dobw between s.dobw-14 and s.dobw+14 and
The format given to distances is only used for display purposes, it has no impact on calculations.
Dear PGStats:
Referring to Proc Optnet, I am trying to use it for my original dataset with libref 9not work library) it does not work. Log says 'Work.Links.Data' not found. Please guide me.
Thanks
The way to refer to a dataset is MYLIBREF.MYDATASET . If you do not specify MYLIBREF then the temporary library WORK is used. So if your LINK dataset is not in the WORK library you must prefix its name with your libref.
As I mentioend earlier, I've Links dataset in my permanent library.
While using Proc optnet data_links=links, the program takes it, probably, as coming from Work library which is actually not.
How can I use my libref which is "pe." here? Where should I write it?
Thanks
Proc optnet data_links=pe.links
Thanks.
Thanks.
Going back to the creation of data Links, the controlNode becomes character and sampleNode becomes numeric variables. So, it does not work for the 'proc optnet' statement due to different in the type. How can I bring the controlNode (which is 4 digit label) to sampleNode (lenght 8, infromats 10.2 and format 10). I added 'format controlNode 4.0.' the log statement showed 'converted from character to numeric but in actuality it did not.
Here is the orginal code, (without format for controlNode, which does not work):
---
data pe.links;
set pe.matches;
where not missing(controlId);
controlNode = controlId;
do i = 1 to min(99, &controlsPerSample);
sampleNode = sampleId + i/100;
output;
end;
format sampleNode 10.2;
keep sampleNode controlNode distance;
run;
/* Perform best linear assignment algorithm */
proc optnet data_links=pe.links graph_direction=directed;
data_links_var
from=sampleNode to=controlNode weight=distance;
linear_assignment out=pe.bestMatches;
run;
---
Thanks
Hi PG,
For my matching strategy, the final step is creating problem. Please see the log below and guide me how to fix it.
proc sql;
217 create table pe.allMatches as
218 select a.group, a.type, a.ctrlId, b.*
219 from pe.stackedMatches as a inner join pe.sample as b on a.id=b.id
220 union all corresponding
221 select a.group, a.type, a.ctrlId, b.*
222 from pe.stackedMatches as a inner join pe.control as b on a.id=b.id
223 order by group, type, ctrlId;
NOTE: Data file PE.SAMPLE.DATA is in a format that is native to another host, or the file encoding
does not match the session encoding. Cross Environment Data Access will be used, which might
require additional CPU resources and might reduce performance.
NOTE: Data file PE.CONTROL.DATA is in a format that is native to another host, or the file encoding
does not match the session encoding. Cross Environment Data Access will be used, which might
require additional CPU resources and might reduce performance.
ERROR: Expression using equals (=) has components that are of different data types.
ERROR: Expression using equals (=) has components that are of different data types.
224 quit
Looks like your Ids are of type character in one of your datasets. You could use a.Id = input(b.Id, best.) as your join condition if b.Id is character.
Thanks PG; it did work.
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.