Thank you @ballardw and @Kurt_Bremser for your kind response. Actually I am working on a big data in millions for my PhD research. I cannot print the result because if I print the results, my computer stops. My data requires a supercomputer but I do not have access to it. So I am using lib.variable_name for creating library. Here is the code that I have run till now without error. /*Import Csv file by name of Block_Header*/ /*proc import datafile= "D:/SAS_Wallet_May2023/Block_Header.csv" out= lib.Block_Header DBMS = csv Replace; getnames= yes; run; */ /* Step 2: Sort the wallets dataset by Block_No and create a temporary sorted dataset */ proc sort data=lib.wallets out=lib.wallets_sorted; by Block_No; run; /* Step 3: Merge wallets data with Block_header data */ data lib.merged_data1; merge lib.wallets_sorted(in=wallets) lib.Block_Header(in=block_header); by Block_No; if wallets or block_header; volume = abs(Sent_Received_Value); hour = int(TM / 3600)+1; /* Convert TM to hours with truncation, and create hour variable */ run; /* Step 4: Sort by Wallets and days */ proc sort data=lib.merged_data1; by Wallet DT hour; run; /* Step 5: Summary of Sent_Received_Value by Wallets and day */ proc means data=lib.merged_data1 noprint; by Wallet DT hour; var Sent_Received_Value volume; output out=lib.summary_data1 sum(Sent_Received_Value volume)=OF total_volume; run; /* Step 6: Summary of absolute values of Sent_Received_Value */ data lib.final_data1; set lib.summary_data1; POF = OF / total_volume; run; After this I want to use clustering algorithm for wallet classification. I want to include addresses id for classification but it gives me error. Is there any method to include addresses id as they are 32 or long character strings? I am looking whether these addresses belong to same wallets or different wallets or if one address (say A1) send transaction to another address (say A2), does both belong to same wallet or different wallets? After this I want to further classify wallets into small & large wallets, low risk and high risk wallets (based on daily transaction volume). Here is the code I have written. /* Step 7: Clustering based on selected variables */ proc standard data=lib.final_data1 out=lib.final_data_std mean=0 std=1; var Sent_Received_Value volume POF; run; proc fastclus data=lib.final_data_std out=lib.clustering_results maxclusters=3 maxiter=100; by Wallet; var Sent_Received_Value volume POF; run; /* Step 8: Classify wallets based on criteria for whale, small, low risk, high risk, medium risk, highly active, and passive wallets */ data lib.classified_wallets; set lib.clustering_results; /* Whale (large) wallets */ if total_volume > 1000000 then wallet_category = "Whale"; /* Small wallets */ else if total_volume <= 10000 then wallet_category = "Small"; /* Low risk wallets */ if POF < 0.2 then risk_category = "Low"; /* High risk wallets */ else if POF >= 0.8 then risk_category = "High"; /* Medium risk wallets */ else risk_category = "Medium"; Is the above code correct? I want to use backward recursive clustering algorithm for addresses ID if it works. But my code gives me error that addresses ID does not exist as it does not exist in the final data results I obtained in step 6. Could you please guide me? Thank you.
... View more