Hi SAS Community!
I was hoping someone could help me with the following issue.
I have 2 data sets, Dataset 1 and Dataset 2. I need to use Dataset 2 to manipulate the data in Dataset 1 to create Dataset 3. Dataset 2 has the Start time of the data and I need all the data that is populated before the Start time for each variable to be changed to blank. Below is the visuals for this process.
Dataset 1 | |||||
Time | T1 | T2 | T3 | T4 | T5 |
1/1/2017 0:00 | 6.818272 | 4.030038 | 8.213272 | 0.060326 | 9.521269 |
1/1/2017 1:00 | 0.100979 | 3.009709 | 2.359012 | 1.749067 | 8.0581 |
1/1/2017 2:00 | 1.20293 | 0.524906 | 0.686842 | 2.561308 | 3.58268 |
1/1/2017 3:00 | 0.178333 | 8.815986 | 7.657224 | 5.581614 | 1.142414 |
1/1/2017 4:00 | 6.654463 | 2.398171 | 2.696395 | 2.181824 | 2.000096 |
1/1/2017 5:00 | 5.428405 | 7.665879 | 7.996566 | 9.112351 | 3.012801 |
1/1/2017 6:00 | 7.710257 | 1.296825 | 7.965649 | 1.278932 | 8.198926 |
1/1/2017 7:00 | 5.733929 | 8.763634 | 7.37632 | 4.684965 | 6.252313 |
1/1/2017 8:00 | 0.173659 | 8.191108 | 7.592309 | 0.764799 | 1.81264 |
1/1/2017 9:00 | 2.782826 | 3.713065 | 8.526002 | 1.06912 | 9.897541 |
1/1/2017 10:00 | 4.46292 | 4.154905 | 4.766856 | 2.918054 | 1.398898 |
1/1/2017 11:00 | 1.949195 | 1.306601 | 4.989425 | 6.146979 | 6.172791 |
1/1/2017 12:00 | 9.865355 | 5.047937 | 2.965091 | 8.109938 | 0.024781 |
1/1/2017 13:00 | 6.361114 | 0.850972 | 3.595049 | 1.18486 | 3.629694 |
1/1/2017 14:00 | 7.494559 | 7.221369 | 9.410831 | 1.467266 | 2.488785 |
1/1/2017 15:00 | 2.129789 | 5.699575 | 0.627782 | 5.550582 | 0.275467 |
1/1/2017 16:00 | 9.046102 | 5.785753 | 0.764385 | 5.695247 | 6.812834 |
1/1/2017 17:00 | 8.056089 | 2.70044 | 0.916423 | 8.568327 | 3.729957 |
1/1/2017 18:00 | 9.318779 | 7.345054 | 5.430452 | 0.23345 | 0.561244 |
1/1/2017 19:00 | 4.827045 | 9.602832 | 1.841213 | 9.62645 | 8.05601 |
1/1/2017 20:00 | 1.623731 | 9.719941 | 4.036305 | 2.812807 | 7.412819 |
1/1/2017 21:00 | 2.199275 | 7.198554 | 0.28249 | 0.405406 | 6.038437 |
1/1/2017 22:00 | 8.08961 | 6.457083 | 8.185543 | 7.05954 | 5.085864 |
1/1/2017 23:00 | 3.040053 | 7.585161 | 8.839786 | 2.42752 | 7.113683 |
Dataset 2 | |
Name | Start |
T1 | 1/1/2017 7:00 |
T2 | 1/1/2017 2:00 |
T3 | 1/1/2017 15:00 |
T4 | 1/1/2017 0:00 |
T5 | 1/1/2017 20:00 |
This is the desired output:
Final Dataset | |||||
Time | T1 | T2 | T3 | T4 | T5 |
1/1/2017 0:00 | 0.060326 | ||||
1/1/2017 1:00 | 1.749067 | ||||
1/1/2017 2:00 | 0.524906 | 2.561308 | |||
1/1/2017 3:00 | 8.815986 | 5.581614 | |||
1/1/2017 4:00 | 2.398171 | 2.181824 | |||
1/1/2017 5:00 | 7.665879 | 9.112351 | |||
1/1/2017 6:00 | 1.296825 | 1.278932 | |||
1/1/2017 7:00 | 5.733929 | 8.763634 | 4.684965 | ||
1/1/2017 8:00 | 0.173659 | 8.191108 | 0.764799 | ||
1/1/2017 9:00 | 2.782826 | 3.713065 | 1.06912 | ||
1/1/2017 10:00 | 4.46292 | 4.154905 | 2.918054 | ||
1/1/2017 11:00 | 1.949195 | 1.306601 | 6.146979 | ||
1/1/2017 12:00 | 9.865355 | 5.047937 | 8.109938 | ||
1/1/2017 13:00 | 6.361114 | 0.850972 | 1.18486 | ||
1/1/2017 14:00 | 7.494559 | 7.221369 | 1.467266 | ||
1/1/2017 15:00 | 2.129789 | 5.699575 | 0.627782 | 5.550582 | |
1/1/2017 16:00 | 9.046102 | 5.785753 | 0.764385 | 5.695247 | |
1/1/2017 17:00 | 8.056089 | 2.70044 | 0.916423 | 8.568327 | |
1/1/2017 18:00 | 9.318779 | 7.345054 | 5.430452 | 0.23345 | |
1/1/2017 19:00 | 4.827045 | 9.602832 | 1.841213 | 9.62645 | |
1/1/2017 20:00 | 1.623731 | 9.719941 | 4.036305 | 2.812807 | 7.412819 |
1/1/2017 21:00 | 2.199275 | 7.198554 | 0.28249 | 0.405406 | 6.038437 |
1/1/2017 22:00 | 8.08961 | 6.457083 | 8.185543 | 7.05954 | 5.085864 |
1/1/2017 23:00 | 3.040053 | 7.585161 | 8.839786 | 2.42752 | 7.113683 |
Hopefully someone out there can help on this!
Thank you!
Here is one way. A nice problem to use the array lookup
data dataset1;
input Time:anydtdtm. T1-T5;
format Time datetime20.;
infile datalines dlm=',';
datalines;
1/1/2017 0:00,6.818272,4.030038,8.213272,0.060326,9.521269
1/1/2017 1:00,0.100979,3.009709,2.359012,1.749067,8.0581
1/1/2017 2:00,1.20293,0.524906,0.686842,2.561308,3.58268
1/1/2017 3:00,0.178333,8.815986,7.657224,5.581614,1.142414
1/1/2017 4:00,6.654463,2.398171,2.696395,2.181824,2.000096
1/1/2017 5:00,5.428405,7.665879,7.996566,9.112351,3.012801
1/1/2017 6:00,7.710257,1.296825,7.965649,1.278932,8.198926
1/1/2017 7:00,5.733929,8.763634,7.37632,4.684965,6.252313
1/1/2017 8:00,0.173659,8.191108,7.592309,0.764799,1.81264
1/1/2017 9:00,2.782826,3.713065,8.526002,1.06912,9.897541
1/1/2017 10:00,4.46292,4.154905,4.766856,2.918054,1.398898
1/1/2017 11:00,1.949195,1.306601,4.989425,6.146979,6.172791
1/1/2017 12:00,9.865355,5.047937,2.965091,8.109938,0.024781
1/1/2017 13:00,6.361114,0.850972,3.595049,1.18486,3.629694
1/1/2017 14:00,7.494559,7.221369,9.410831,1.467266,2.488785
1/1/2017 15:00,2.129789,5.699575,0.627782,5.550582,0.275467
1/1/2017 16:00,9.046102,5.785753,0.764385,5.695247,6.812834
1/1/2017 17:00,8.056089,2.70044,0.916423,8.568327,3.729957
1/1/2017 18:00,9.318779,7.345054,5.430452,0.23345,0.561244
1/1/2017 19:00,4.827045,9.602832,1.841213,9.62645,8.05601
1/1/2017 20:00,1.623731,9.719941,4.036305,2.812807,7.412819
1/1/2017 21:00,2.199275,7.198554,0.28249,0.405406,6.038437
1/1/2017 22:00,8.08961,6.457083,8.185543,7.05954,5.085864
1/1/2017 23:00,3.040053,7.585161,8.839786,2.42752,7.113683
;
data dataset2;
input Name $ Time:anydtdtm.;
infile datalines dlm=',';
format Time datetime20.;
datalines;
T1,1/1/2017 7:00
T2,1/1/2017 2:00
T3,1/1/2017 15:00
T4,1/1/2017 0:00
T5,1/1/2017 20:00
;
data want(drop= i j name);
array lookup{5} _temporary_;
do i=1 by 1 until(eof1);
set dataset2 end=eof1;
lookup[i]=Time;
end;
array T{5};
do until(eof2);
set dataset1 end=eof2;
do j=1 to dim(lookup);
if Time lt lookup[j] then T[j]=.;
end;
output;
end;
run;
Here is one way. A nice problem to use the array lookup
data dataset1;
input Time:anydtdtm. T1-T5;
format Time datetime20.;
infile datalines dlm=',';
datalines;
1/1/2017 0:00,6.818272,4.030038,8.213272,0.060326,9.521269
1/1/2017 1:00,0.100979,3.009709,2.359012,1.749067,8.0581
1/1/2017 2:00,1.20293,0.524906,0.686842,2.561308,3.58268
1/1/2017 3:00,0.178333,8.815986,7.657224,5.581614,1.142414
1/1/2017 4:00,6.654463,2.398171,2.696395,2.181824,2.000096
1/1/2017 5:00,5.428405,7.665879,7.996566,9.112351,3.012801
1/1/2017 6:00,7.710257,1.296825,7.965649,1.278932,8.198926
1/1/2017 7:00,5.733929,8.763634,7.37632,4.684965,6.252313
1/1/2017 8:00,0.173659,8.191108,7.592309,0.764799,1.81264
1/1/2017 9:00,2.782826,3.713065,8.526002,1.06912,9.897541
1/1/2017 10:00,4.46292,4.154905,4.766856,2.918054,1.398898
1/1/2017 11:00,1.949195,1.306601,4.989425,6.146979,6.172791
1/1/2017 12:00,9.865355,5.047937,2.965091,8.109938,0.024781
1/1/2017 13:00,6.361114,0.850972,3.595049,1.18486,3.629694
1/1/2017 14:00,7.494559,7.221369,9.410831,1.467266,2.488785
1/1/2017 15:00,2.129789,5.699575,0.627782,5.550582,0.275467
1/1/2017 16:00,9.046102,5.785753,0.764385,5.695247,6.812834
1/1/2017 17:00,8.056089,2.70044,0.916423,8.568327,3.729957
1/1/2017 18:00,9.318779,7.345054,5.430452,0.23345,0.561244
1/1/2017 19:00,4.827045,9.602832,1.841213,9.62645,8.05601
1/1/2017 20:00,1.623731,9.719941,4.036305,2.812807,7.412819
1/1/2017 21:00,2.199275,7.198554,0.28249,0.405406,6.038437
1/1/2017 22:00,8.08961,6.457083,8.185543,7.05954,5.085864
1/1/2017 23:00,3.040053,7.585161,8.839786,2.42752,7.113683
;
data dataset2;
input Name $ Time:anydtdtm.;
infile datalines dlm=',';
format Time datetime20.;
datalines;
T1,1/1/2017 7:00
T2,1/1/2017 2:00
T3,1/1/2017 15:00
T4,1/1/2017 0:00
T5,1/1/2017 20:00
;
data want(drop= i j name);
array lookup{5} _temporary_;
do i=1 by 1 until(eof1);
set dataset2 end=eof1;
lookup[i]=Time;
end;
array T{5};
do until(eof2);
set dataset1 end=eof2;
do j=1 to dim(lookup);
if Time lt lookup[j] then T[j]=.;
end;
output;
end;
run;
Thank you for all your help! Quick question. The dataset I showed only had 5 observations. My actual dataset has thousands each with unique names that is not as simple as T1, T2, and so on. What can I change the array to so it handles this?
@Singham20 can you post some data that represents this? Not with thousands of unique variables though.
So here is how the variable look. I am just showing 5, however there are 24,785 variables.
Time | _48164_1 | _W468s_1 | _di4ek452_5 | _47816_1 | _zidle_45 |
1/1/2017 0:00 | 6.818272 | 4.030038 | 8.213272 | 0.060326 | 9.521269 |
1/1/2017 1:00 | 0.100979 | 3.009709 | 2.359012 | 1.749067 | 8.0581 |
1/1/2017 2:00 | 1.20293 | 0.524906 | 0.686842 | 2.561308 | 3.58268 |
1/1/2017 3:00 | 0.178333 | 8.815986 | 7.657224 | 5.581614 | 1.142414 |
1/1/2017 4:00 | 6.654463 | 2.398171 | 2.696395 | 2.181824 | 2.000096 |
1/1/2017 5:00 | 5.428405 | 7.665879 | 7.996566 | 9.112351 | 3.012801 |
1/1/2017 6:00 | 7.710257 | 1.296825 | 7.965649 | 1.278932 | 8.198926 |
1/1/2017 7:00 | 5.733929 | 8.763634 | 7.37632 | 4.684965 | 6.252313 |
1/1/2017 8:00 | 0.173659 | 8.191108 | 7.592309 | 0.764799 | 1.81264 |
1/1/2017 9:00 | 2.782826 | 3.713065 | 8.526002 | 1.06912 | 9.897541 |
1/1/2017 10:00 | 4.46292 | 4.154905 | 4.766856 | 2.918054 | 1.398898 |
1/1/2017 11:00 | 1.949195 | 1.306601 | 4.989425 | 6.146979 | 6.172791 |
1/1/2017 12:00 | 9.865355 | 5.047937 | 2.965091 | 8.109938 | 0.024781 |
1/1/2017 13:00 | 6.361114 | 0.850972 | 3.595049 | 1.18486 | 3.629694 |
1/1/2017 14:00 | 7.494559 | 7.221369 | 9.410831 | 1.467266 | 2.488785 |
1/1/2017 15:00 | 2.129789 | 5.699575 | 0.627782 | 5.550582 | 0.275467 |
1/1/2017 16:00 | 9.046102 | 5.785753 | 0.764385 | 5.695247 | 6.812834 |
1/1/2017 17:00 | 8.056089 | 2.70044 | 0.916423 | 8.568327 | 3.729957 |
1/1/2017 18:00 | 9.318779 | 7.345054 | 5.430452 | 0.23345 | 0.561244 |
1/1/2017 19:00 | 4.827045 | 9.602832 | 1.841213 | 9.62645 | 8.05601 |
1/1/2017 20:00 | 1.623731 | 9.719941 | 4.036305 | 2.812807 | 7.412819 |
1/1/2017 21:00 | 2.199275 | 7.198554 | 0.28249 | 0.405406 | 6.038437 |
1/1/2017 22:00 | 8.08961 | 6.457083 | 8.185543 | 7.05954 | 5.085864 |
1/1/2017 23:00 | 3.040053 | 7.585161 | 8.839786 | 2.42752 | 7.113683 |
1/1/2017 23:00 | 3.040053 | 7.585161 | 8.839786 | 2.42752 | 7.113683 |
If your data is representative and all of the variables of interest have prefix _, then simply do
data want(drop= i j name);
array lookup{5} _temporary_;
do i=1 by 1 until(eof1);
set dataset2 end=eof1;
lookup[i]=Time;
end;
do until(eof2);
set dataset1 end=eof2;
array T{*} _:;
do j=1 to dim(lookup);
if Time lt lookup[j] then T[j]=.;
end;
output;
end;
run;
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.