Hi,
My data contains information on 6 different categories. Each category has L, F, Start, Reg variables. The final variable close_diff is the variable I need to compare to get the required dataset.
For ID#1, the close_diff = -90 is equal to start_4=-90; therefore the L_before is the corresponding L_4=2, F_before=C is corresponding F_4, and Reg_before is corresponding R_4= d,e,r. The after variables are calculated using the next set. That is, L_after = L_5 which is 3, F_after = F_5, which is C and Reg_after = Reg_5 which is c,e,f,t.
For ID#2, the close_diff = 2 and is a positive number, so therefore the before and after variables are swapped. The close_diff = 2 is equal to Start = -5; now the L_after = L_5 = 3, F_after=F_5=C, and Reg_After = c,e,f,t. Now the L_before is from the information from before; i.e. L_before = L_4 =2 , F_before = F_4 = O, and Reg_Begore = f,d,d
When the close_diff = 0; it is considered a negative number and the same rules apply asID#1
There might be instances where there is no information and is left blank as shown in Want data
Data Have:
SubjectID | L_1 | F_1 | Start_1 | Reg_1 | L_2 | F_2 | Start_2 | Reg_2 | L_3 | F_3 | Start_3 | Reg_3 | L_4 | F_4 | Start_4 | Reg_4 | L_5 | F_5 | Start_5 | Reg_5 | L_6 | Fa_6 | Start_6 | Reg_6 | Close_diff |
1 | -1 | 0 | 1 | C | -150 | a,b,d,e | 2 | C | -90 | d,e,r | 3 | C | 131 | c,e,f,t | 4 | C | 169 | s,r | -90 | ||||||
2 | -1 | 0 | 1 | O | -116 | s,g,y,u,u | 2 | O | -36 | f,d,d | 3 | C | 2 | f,e,w | 4 | 2 | |||||||||
3 | -1 | 0 | 1 | C:O | -840 | r,d,h,j,r | 2 | C | -197 | q,d,f,d | 3 | 4 | -197 | ||||||||||||
4 | 1 | 0 | 1 | O | -1249 | t,y,o,l | 2 | O | -965 | w,d,g | 3 | C:O | -425 | d,g,e,t | 4 | C | -126 | f | -126 | ||||||
5 | 1 | 0 | 1 | O | -1245 | q,d,f,d | 2 | O | -1133 | f | 3 | O | -938 | f,e,s,r | 4 | O | -446 | e,g,u | -446 | ||||||
6 | 1 | 0 | 1 | O | -832 | w,d,g | 2 | C | 78 | f | 3 | C | 440 | d,f,g,h | 4 | 78 | |||||||||
7 | 1 | 0 | 1 | O | -518 | f | 2 | O | -263 | a,b,d,e | 3 | C:O | -77 | f,a | 4 | C | 166 | a,b,d,e | -77 | ||||||
8 | 1 | 0 | 1 | O | -3670 | f | 2 | O | -810 | s,g,y,u,u | 3 | O | -717 | f,t,h | 4 | C:O | -292 | s,g,y,u,u | -292 | ||||||
9 | 1 | O | -1269 | a,b | 0 | O | -1181 | a | 1 | O | -700 | a,b,d,e | 2 | O | -362 | f,a | 3 | C | -183 | w,e,s,f | 4 | f,a | -183 | ||
10 | 1 | 0 | 1 | C:O | -657 | s,g,y,u,u | 2 | C | -113 | f,t,h | 3 | C | 0 | s,w,r | 4 | C | 101 | f,t,h | 0 | ||||||
11 | 1 | 0 | 1 | O | -1227 | r,d,h,j,r | 2 | O | -997 | w,e,s,f | 3 | O | -695 | c,e,f,t | 4 | C | -567 | c,e,f,t | -567 | ||||||
12 | 1 | 0 | O | -2525 | a | 1 | C:O | -298 | t,y,o,l | 2 | C:O | 83 | s,w,r | 3 | C:O | 477 | f,e,w | 4 | C:O | 618 | f,e,w | 83 |
Data Want:
SubjectID | L_Before | F_Before | Reg_Before | L_After | F_After | Reg_After |
1 | 2 | C | d,e,r | 3 | C | c,e,f,t |
2 | 2 | O | f,d,d | 3 | C | f,e,w |
3 | 2 | C | q,d,f,d | 3 | ||
4 | 4 | C | f | |||
5 | 4 | O | e,g,u | |||
6 | 1 | O | w,d,g | 2 | C | f |
7 | 3 | C:O | f,a | 4 | C | a,b,d,e |
8 | 4 | C:O | s,g,y,u,u | |||
9 | 3 | C | w,e,s,f | 4 | f,a | |
10 | 3 | C | s,w,r | 4 | C | f,t,h |
11 | 4 | C | c,e,f,t | |||
12 | 1 | C:O | t,y,o,l | 2 | C:O | s,w,r |
Please provide the data in usable format (data step or SQL insert statements).
@ChrisNZ ,
I have attached the dataset. I had a hard time with inputting with datalines. Sorry, i have not used datalines that much for data input. Please let me know if this works
I am not going to go through the effort of converting your CSV file to a SAS dataset, so this program is not tested:
data want;
set have;
array L_ {6};
array F_ {6};
array REG_ $ {6};
array START_ {6};
b=whichn(close_diff,of start_{*});
if b=0 then return;
if close_diff>0 then b=b-1;
if (1<=b<=6) then do;
L_before=L_{b}; F_before=F_{b}; reg_before=reg_{b};
end;
if (0<=b<=5) then do;
L_after=L_{b+1}; F_after=F_{b+1}; reg_after=reg_{b+1};
end;
drop b;
run;
However, I did actually look at your CSV. The 4th close_diff value is -126, which matches start_6. So the "before" variables get the 6th set of values, but there is no 7th set, which presumably would go to the "after" variables.
And you could have a similar set of problems if you had a positive close_diff matching START_1. According to your rules the "after" would get the 1st collection, and the "before" would get what?
The program above avoids such "out of bounds" situations.
Hi @mkeintz :
Added the data step above.
Also, since there is no 7th set, the observations is going to be empty. You will notice the in a few IDs where there is missing values in the 'Want' dataset.
For positive values, the before will have missing data. Hope this makes sense
Here is the data step for inputting data
data have;
infile cards dsd dlm=",";
input ID L_1 F_1 $ Start_1 Reg_1 $ L_2 F_2 $ Start_2 Reg_2 $ L_3 F_3 $ Start_3 Reg_3 $ L_4 F_4 $ Start_4 Reg_4 $ L_5 F_5 $ Start_5 Reg_5 $ L_6 F_6 $ Start_6 Reg_6 $ Close_diff;
format Reg_2 $10. Reg_3 $10.;
cards;
1,-1,,,,0,,,,1,C,-150,"a,b,d,e",2,C,-90,"d,e,r",3,C,131,"c,e,f,t",4,C,169,"s,r",-90
2,-1,,,,0,,,,1,O,-116,"s,g,y,u,u",2,O,-36,"f,d,d",3,C,2,"f,e,w",4,,,,2
3,-1,,,,0,,,,1,C:O,-840,"r,d,h,j,r",2,C,-197,"q,d,f,d",3,,,,4,,,,-197
4,1,,,,0,,,,1,O,-1249,"t,y,o,l",2,O,-965,"w,d,g",3,C:O,-425,"d,g,e,t",4,C,-126,f,-126
5,1,,,,0,,,,1,O,-1245,"q,d,f,d",2,O,-1133,f,3,O,-938,"f,e,s,r",4,O,-446,"e,g,u",-446
6,1,,,,0,,,,1,O,-832,"w,d,g",2,C,78,f,3,C,440,"d,f,g,h",4,,,,78
7,1,,,,0,,,,1,O,-518,f,2,O,-263,"a,b,d,e",3,C:O,-77,"f,a",4,C,166,"a,b,d,e",-77
8,1,,,,0,,,,1,O,-3670,f,2,O,-810,"s,g,y,u,u",3,O,-717,"f,t,h",4,C:O,-292,"s,g,y,u,u",-292
9,1,O,-1269,"a,b",0,O,-1181,a,1,O,-700,"a,b,d,e",2,O,-362,"f,a",3,C,-183,"w,e,s,f",4,,,"f,a",-183
10,1,,,,0,,,,1,C:O,-657,"s,g,y,u,u",2,C,-113,"f,t,h",3,C,0,"s,w,r",4,C,101,"f,t,h",0
11,1,,,,0,,,,1,O,-1227,"r,d,h,j,r",2,O,-997,"w,e,s,f",3,O,-695,"c,e,f,t",4,C,-567,"c,e,f,t",-567
12,1,,,,0,O,-2525,a,1,C:O,-298,"t,y,o,l",2,C:O,83,"s,w,r",3,C:O,477,"f,e,w",4,C:O,618,"f,e,w",83
;
Run;
Please post you code using the appropriate icon.
The result you want is given by altering @mkeintz 's code slightly.
data WANT;
set HAVE;
array L_ {6} ;
array F_ {6} ;
array REG_ {6} $10 ;
array START_ {6} ;
POS=whichn(CLOSE_DIFF, of START_{*});
if POS;
if CLOSE_DIFF > 0 then POS=POS-1;
if 1 <= POS <= 6 then do;
L_BEFORE=L_{POS} ; F_BEFORE=F_{POS} ; REG_BEFORE=reg_{POS} ;
end;
if 0 <= POS <= 5 then do;
L_AFTER=L_{POS+1}; F_AFTER=F_{POS+1}; REG_AFTER=reg_{POS+1};
end;
run;
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.