01-29-2018 06:18 PM - edited 01-29-2018 08:06 PM
I am currently working on a creating master file for further SAS Analysis. For the most part, all 4 files are formatted to produce the master file I desire but I cannot seem to bring the files together without losing data.
For starters file 1 is the base and the other 3 files will add to it. The relevant portion of dataset 1 looks like so:
ID From To
1 0 0.5
1 0.5 1
2 0 1
3 0.3 0.7
In dataset 2 is as such:
IDD Off Count
1 0.2 10
1 0.7 5
2 0.5 8
3 0.3 20
Where IDD is the same as ID and Off in between From and To
3/4 are similar in concept and look like so:
ID FROM TO Count2
1 0 1 1.65
2 0 1 2.0
3 0.3 0.7 1.56
ID FROM TO Count3
1 0 1 18547
2 0 1 23506
3 0.3 0.7 6582
Where Count2 and Count are not related. I have tried the following code trying to merge 1 and 2 but the resultant (5) results in huge quantities of sections being lost (3000) observations to (700) observations.
proc sql; create table my.5 as select * from my.1 as 1, my.2 as 2 where 1.ID = 2.IDD and 1.From<=2.Off and 1.To >= 2.Off; quit;
Overall I wish the master file to look like:
ID From To Count Count2 Count3
1 0 0.5 10 1.65 18547
1 0.5 1 5 1.65 18457
2 0 1 8 2.0 23506
3 0.3 0.7 20 1.56 6582
01-29-2018 07:14 PM
1) 1 , 2, 3 etc. are not valid SAS data set names OR sql alias
2) Select * doesn't rename any of the variables so you will have issues with multiple From, to and count2 clashing.
I would suggest adding things one at a time until you become more familiar with nesting queries and getting the variables renamed as needed.
See if this works for combining the first two sets:
proc sql; create table first as select a.id,a.from, a.to, b.count from data1 as a left join data2 as b on a.id=b.id and a.from le b.off le a.to ; run;
If that works then use the resulting First as the A set in the next block of code.
You will have to explain the rules involved with deciding that both (0,0.5) and (0.5,1) (in the first set) matches to (0,1) but (0,1) in the first set does not match (0,1) in the third or fourth set.
01-29-2018 08:09 PM
Hello ballardw, thank you very much for your reply and interest in this matter.
1) Yes although I call them 1,2,3 and 4 those are not the original file names, the ones I use work no problem. But to ease confusion let us refer to them as sets: A, B, C, D and E(final).
2) In D I made a mistake I meant to indicate is as Count3
I am working on merging the files one at a time and not all at once.
I will try the code.
Frist in A,C and D ID/IDD is an identifier of a unique portion of a road. That portion is then defined by a continuous length identifier FROM and TO.
In A there are two of the same IDs because another variable(not shown) is different the requiring to split it into 2 sections.
In B only the IDD (where IDD=ID) and at some Off there is a point of interest. Thus the first line in B will only fall into the gap between 0 and 0.5 and not the second and vice versa for the second observation.
Apart from the naming mistake, C and D are the same sections but the Count3 is the only observation and thus there are no splits.