Hello,
My Dataset which is a .csv file (Comma Seperated):
Header: ID,"Folder Path","File Name","Log Time",Action,"Login Name",IPAddress,"Transfer Size",Duration
Row1:1344624,"/Distribution/Projects/WUXI-EDO/Input","C12345_xxx_wxp_img_20221013_114200_349.cum",,"file_download",mft2sftp,"172.24.147.36",46438, Row2:13445945,"/Distribution/Projects/WUXI-EDO/Input","C12345_xxx_wxp_img_20221013_114200_349.cum",,"file_upload",hud8,"204.134.192.105",46438,
Row3:13442235,"/Distribution/Projects/WUXI-EDO/Input","C12345_xxx_npn_img_20221014_113344_600.cum",,"file_upload",kp123,"204.134.192.105",46438,
Basically, I want to flag any File Name upload which does not have a corresponding download. In the above example, The expected output is ROW 3 since it does not contain a "file_download" row
My code:
data alldata_status_1;
set alldata;
do;
if File_Name^= ' ' ;
if action in ('file_upload','file_download')
then download_flag = 'yes';
else download_flag = 'no';
end;
run;
I am using Download_flag since It will help me subset the datasets.
Thank you for your help.
@Kp1234 wrote:
Thank you for the solution! Perhaps i was not clear with the solution I was looking for.
In the final output, I would like row 3 to have download_flag= 'No' since for that file_name action upload does not have action download.
Basically,
If file_name matches, check Action variable has both Upload and Download for that file_name. If true, Download_flag = "Yes" else "NO"
So the only way that makes any sense is if what you mean is that you want to check if the same file has records with both upload and download actions. Which means it is not something you can do by checking just one observation.
proc sql ;
create table check as
select file_name
, sum( action = 'file_upload') as uploads
, sum( action = 'file_download') as downloads
from have
group by file_name
;
quit;
Now with that level of information you can check if a file had both upload and download records.
data want;
set check;
if uploads and downloads then want='YES';
else want='NO';
run;
Check your CSV file once imported. I strongly suspect your "Action" column isn't populated because it coincides with double comma delimiters (,,) in your data rows.
Does your file really have that extra stuff at the start of each line? Why is ROW2 in the middle of ROW1?
Header: ID,"Folder Path","File Name","Log Time",Action,"Login Name",IPAddress,"Transfer Size",Duration Row1:1344624,"/Distribution/Projects/WUXI-EDO/Input","C12345_xxx_wxp_img_20221013_114200_349.cum",,"file_download",mft2sftp,"172.24.147.36",46438, Row2:13445945,"/Distribution/Projects/WUXI-EDO/Input","C12345_xxx_wxp_img_20221013_114200_349.cum",,"file_upload",hud8,"204.134.192.105",46438, Row3:13442235,"/Distribution/Projects/WUXI-EDO/Input","C12345_xxx_npn_img_20221014_113344_600.cum",,"file_upload",kp123,"204.134.192.105",46438,
Or is all of that an artifact of typing/pasting the lines into the comment window instead of the pop-up window that would preserve the formatting?
ID,"Folder Path","File Name","Log Time",Action,"Login Name",IPAddress,"Transfer Size",Duration 1344624,"/Distribution/Projects/WUXI-EDO/Input","C12345_xxx_wxp_img_20221013_114200_349.cum",,"file_download",mft2sftp,"172.24.147.36",46438, 13445945,"/Distribution/Projects/WUXI-EDO/Input","C12345_xxx_wxp_img_20221013_114200_349.cum",,"file_upload",hud8,"204.134.192.105",46438, 13442235,"/Distribution/Projects/WUXI-EDO/Input","C12345_xxx_npn_img_20221014_113344_600.cum",,"file_upload",kp123,"204.134.192.105",46438,
If the file looks like the latter then your code should work fine although you should remove the unnecessary DO and matching END as they add nothing at all to the logic, just confusion for the programmer.
proc import dbms=csv file=sample out=want replace ;
run;
data want2;
set want;
if File_Name^= ' ' ;
if action in ('file_upload','file_download') then download_flag = 'yes';
else download_flag = 'no';
run;
proc print;
run;
WARNING: If you somehow have accidentally set the VALIDVARNAME option to ANY before running the PROC IMPORT step then the third columns variable name will be "File Name"n instead of File_Name.
Or if you wrote your own data step to read the file then use the variable names you created.
Thank you for the solution! Perhaps i was not clear with the solution I was looking for.
In the final output, I would like row 3 to have download_flag= 'No' since for that file_name action upload does not have action download.
Basically,
If file_name matches, check Action variable has both Upload and Download for that file_name. If true, Download_flag = "Yes" else "NO"
@Kp1234 wrote:
Thank you for the solution! Perhaps i was not clear with the solution I was looking for.
In the final output, I would like row 3 to have download_flag= 'No' since for that file_name action upload does not have action download.
Basically,
If file_name matches, check Action variable has both Upload and Download for that file_name. If true, Download_flag = "Yes" else "NO"
So the only way that makes any sense is if what you mean is that you want to check if the same file has records with both upload and download actions. Which means it is not something you can do by checking just one observation.
proc sql ;
create table check as
select file_name
, sum( action = 'file_upload') as uploads
, sum( action = 'file_download') as downloads
from have
group by file_name
;
quit;
Now with that level of information you can check if a file had both upload and download records.
data want;
set check;
if uploads and downloads then want='YES';
else want='NO';
run;
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.