BookmarkSubscribeRSS Feed
Androniki
Calcite | Level 5

Hi, 

How can I assign ID values that are character variables along with numerical variables in var statement? I get error "ID does not exist" even when i included

ID= put(id, $CHAR); in my statement before using in var with numerical variables.

I want to include it here:

proc means data = lib.data noprint;

by bank_code day hour;

var sent_amount received_amount volume id /type=string;

run;

Could anyone please help me in resolving the error "id does not exist" in my SAS program? 

Thank you. 

9 REPLIES 9
ballardw
Super User

What do you mean by "assign ID variables"? I think you have specific usage in mind and I'm not going to try to play mind-reader as to what it should be.

 

 

Best is to show entire data or proc steps and when you get notes you have questions about include the LOG text with the code and ALL notes.

 

If ID is a numeric variable then that statement with the Put will generate a warning and not do anything because $char cannot be used with numeric variables.

158  data junk;
159     id=25;
160     ID= put(id, $CHAR.);
                    ------
                    484
WARNING: Variable id has already been defined as numeric.
NOTE 484-185: Format CHAR was not found or could not be loaded.

161  run;

If it is character then really not needed at all unless you are attempting to shorten a value but you didn't specify a length.

 

From what you have shown I can't tell if that single line of data step code is involved with making the data set used in proc means.

From the proc means code, with ID on the VAR statement the attempt to make a character value makes no sense as VAR requires only numeric variables in Proc Means. In fact I expect your Proc means code to throw multiple errors. "type=string" is not a valid option for the Var statement. Check https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.4/proc/p12n9rrav4byvzn1b9wj26dcoq9l.htm only option available for the Var statement is weight=variable

 

There is also the issue of using the option NOPRINT to suppress printed output but not requesting an output data set. So where are you going to see any results???

 

For what it is worth, when discussing TYPE in the context of Proc Means or Summary it refers to groups of Class variables and the automatic variable _type_ that indicates which combination the observation in the output set used.

Kurt_Bremser
Super User
ID= put(id, $CHAR);

does not make sense.

If ID is character, its contents won't change, and if it is numeric (which an id should not be in the first place), you cannot change its type to character like this.

Also, the format name must be terminated with a dot.

 

Please post the complete log (code and messages) of your code by copy/pasting it into a window opened with this button:

Bildschirmfoto 2020-04-07 um 08.32.59.jpg

Androniki
Calcite | Level 5
Actually ID belong to cryptocurrency addresses that i want to connect to trace back to each wallet they make transaction (sent and received). That is why Addresses are not numerical. They are long strings while the rest of my variables for studying transactions are numerical. I want to use clustering algorithm after obtaining the summary of transactions. So I thought to include addresses in the merged data (leading to summary) for using later in clustering wallets. Is it possible to do it? Thank you.
ballardw
Super User

@Androniki wrote:
Actually ID belong to cryptocurrency addresses that i want to connect to trace back to each wallet they make transaction (sent and received). That is why Addresses are not numerical. They are long strings while the rest of my variables for studying transactions are numerical. I want to use clustering algorithm after obtaining the summary of transactions. So I thought to include addresses in the merged data (leading to summary) for using later in clustering wallets. Is it possible to do it? Thank you.

Not actually clear BUT if the intent is to get a separate summary in proc means or summary grouped by one or more other variables the choices are BY and CLASS statements. BY requires the data to be sorted by those variables prior to Proc Means, of if the values are at least grouped by that variable use the NOTSORTED option on the BY statement. CLASS lists variables to group the output by, does not require sorting BUT has limits internally on how many variables/levels can be used. I seldom run into that issue but if you have LOTS of records then you may need to sort the data.

 

Since you are already using a BY statement you could add the variable ID to that statement. Or add a CLASS statement with ID as the variable.

proc means data = lib.data ;
   by bank_code day hour;
   class id;
   var sent_amount received_amount volume;
run;

However with the above you do not want NOPRINT as you will get no output.

If you want an output data set then you need to add an Output statement. Then Noprint would make a little sense. However you must request statistics for an output data set. Something like

proc means data = lib.data noprint;
   by bank_code day hour;
   class id;
   var sent_amount received_amount volume;
   output out=want min= mean= max= std= /autoname;
run;

Note the statistics requested are only a few of those possible and you haven't described which you want. The Autoname option appends the name of the statistic to the name of the variable in the output data.

 

 

Androniki
Calcite | Level 5

Thank you @ballardw and @Kurt_Bremser  for your kind response. Actually I am working on a big data in millions for my PhD research. I cannot print the result because if I print the results, my computer stops. My data requires a supercomputer but I do not have access to it. So I am using lib.variable_name for creating library. 

Here is the code that I have run till now without error.                                                       

/*Import Csv file by name of Block_Header*/

/*proc import datafile= "D:/SAS_Wallet_May2023/Block_Header.csv" out= lib.Block_Header DBMS = csv Replace; getnames= yes; run; */

 

/* Step 2: Sort the wallets dataset by Block_No and create a temporary sorted dataset */ proc sort data=lib.wallets out=lib.wallets_sorted; by Block_No; run;

/* Step 3: Merge wallets data with Block_header data */ data lib.merged_data1; merge lib.wallets_sorted(in=wallets) lib.Block_Header(in=block_header);

by Block_No;

if wallets or block_header;

volume = abs(Sent_Received_Value);

hour = int(TM / 3600)+1; /* Convert TM to hours with truncation, and create hour variable */ run;

/* Step 4: Sort by Wallets and days */

proc sort data=lib.merged_data1; by Wallet DT hour; run;

/* Step 5: Summary of Sent_Received_Value by Wallets and day */

proc means data=lib.merged_data1 noprint;

by Wallet DT hour;

var Sent_Received_Value volume;

output out=lib.summary_data1 sum(Sent_Received_Value volume)=OF total_volume;

run;

/* Step 6: Summary of absolute values of Sent_Received_Value */

data lib.final_data1; set lib.summary_data1;

POF = OF / total_volume;

run; 

After this I want to use clustering algorithm for wallet classification. I want to include addresses id for classification but it gives me error. Is there any method to include addresses id as they are 32 or long character strings? I am looking whether these addresses belong to same wallets or different wallets or if one address (say A1) send transaction to another address (say A2), does both belong to same wallet or different wallets? After this I want to further classify wallets into small & large wallets, low risk and high risk wallets (based on daily transaction volume).  Here is the code I have written. 

/* Step 7: Clustering based on selected variables */

proc standard data=lib.final_data1 out=lib.final_data_std mean=0 std=1;

var Sent_Received_Value volume POF;

run;

proc fastclus data=lib.final_data_std out=lib.clustering_results maxclusters=3 maxiter=100;

by Wallet; var Sent_Received_Value volume POF;

run; 

 

/* Step 8: Classify wallets based on criteria for whale, small, low risk, high risk, medium risk, highly active, and passive wallets */

data lib.classified_wallets;

set lib.clustering_results;

/* Whale (large) wallets */

if total_volume > 1000000 then wallet_category = "Whale";

/* Small wallets */

else if total_volume <= 10000 then wallet_category = "Small";

 

/* Low risk wallets */

if POF < 0.2 then risk_category = "Low";

/* High risk wallets */

else if POF >= 0.8 then risk_category = "High";

/* Medium risk wallets */ else risk_category = "Medium";

 

Is the above code correct? 

I want to use backward recursive clustering algorithm for addresses ID if it works. But my code gives me error that addresses ID does not exist as it does not exist in the final data results I obtained in step 6. Could you please guide me? 

Thank you. 

Androniki
Calcite | Level 5
Another problem i am experiencing as I run step 7, it keeps running but then I receive a message that my window is full and needs clearing before it runs more results. How shall I deal with this?
ballardw
Super User

@Androniki wrote:
Another problem i am experiencing as I run step 7, it keeps running but then I receive a message that my window is full and needs clearing before it runs more results. How shall I deal with this?

That is because you are creating a truly massive amount of output written to the results window. Since you likely want the data set more than the tabular output written to results you can add the NOPRINT option to your Proc Fastclus code in that step.

ballardw
Super User

@Androniki wrote:
Another problem i am experiencing as I run step 7, it keeps running but then I receive a message that my window is full and needs clearing before it runs more results. How shall I deal with this?

That means you are generating a lot of output in the results. If you are not going to wade through all the text and just want the data set output you can use the NOPRINT option to suppress all the results. Or use the SHORT option to reduce the output or SUMMARY just to show the final cluster summary output.

 

SAS in this case is not actually "running" per se but is attempting to generate a couple hundred pages of html tables and applying all the style options to the tables.

ballardw
Super User

@Androniki wrote:

 

After this I want to use clustering algorithm for wallet classification. I want to include addresses id for classification but it gives me error. Is there any method to include addresses id as they are 32 or long character strings? I am looking whether these addresses belong to same wallets or different wallets or if one address (say A1) send transaction to another address (say A2), does both belong to same wallet or different wallets? After this I want to further classify wallets into small & large wallets, low risk and high risk wallets (based on daily transaction volume).  Here is the code I have written. 

 


If you want help with an error copy the code and all messages from the step throwing the error from the log. Open a text box on the forum and paste all the text. Then we can discuss the specifics. At this point I can't tell where you are attempting to use this 32 character or longer variable or even its name.

 

If SAS tells you a variable doesn't exist then I will tend to believe SAS. Run proc contents on the data set used for input to a procedure that reports that. That will tell you what SAS sees in the set.

 

Personally you are confusing, at least to me, in use of "wallet" "address" and "id" and I'm not sure which variable(s) identify any of those. Please remember we have not spent hours or days dealing with your data and have none of the internal knowledge of the data. We only have exactly what you share.

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 1431 views
  • 1 like
  • 3 in conversation