BookmarkSubscribeRSS Feed
me55
Quartz | Level 8

okay, so i have a field with names in it called name.  i am trying to parallel a process.  i am relatively new to sas as well so here is what i need to do...

 

i need to get the ascii values of the first 30 letters (ONLY because it can be more) of the name, sum them and update a field a_name with a concatenation of that and the fields record number.  so, like this...

 

id     name     a_name

101  abcd      101-394

102  efgh       102-410

103  ijkl          103-426 ...

 

you get the idea.  i am piecing it together but help would be appreciated. 

14 REPLIES 14
ballardw
Super User

This works for your example data:

data have;
infile datalines truncover;
informat id $5. name $30.;
input id name;
datalines;
101  abcd     
102  efgh     
103  ijkl     
;
run;

data want;
   set have;
   do i= 1 to length(name);
      count= sum(count,rank(substr(name,i,1)));
   end;
   length a_name $ 12;
   a_name=catx('-',id,count);
   drop i count;
run;

HOWEVER there may be a few things to consider. Does your "name" variable include trailing blanks as part of the value?

 

Also the lengths I assigned to id and a_name a guesses. You'll have to provide better based on your actual values.

me55
Quartz | Level 8

this is an ~80,000 line table.  i believe i have id set at 6 and then name can be up to 50. 

me55
Quartz | Level 8

yeah, my first attempt that did not work.  i am sure it is the datalines part.  so this needs to run for every value in a table that has 80k values in it.  i cannot write out that many datalines.  also, the name are business names so there could be values in them however i have removed all trailing spaces via trim or something...

ballardw
Super User

@me55 wrote:

yeah, my first attempt that did not work.  i am sure it is the datalines part.  so this needs to run for every value in a table that has 80k values in it.  i cannot write out that many datalines.  also, the name are business names so there could be values in them however i have removed all trailing spaces via trim or something...


YOU do not need the datalines. I did that to provide a workable SAS data set. You should only need the Data Want step if you already have a SAS data set. If you don't have a SAS data set then import or use a data step to read the text file.

 

The length I set for a_name as I said was a guess. If your ID was 10 characters and you have up to 50 characters in name then you need to make a_name at least 10(length of ID) +1(for the dash) + something long enough to hold the sum of 50 ASCII codes. Assuming (big dange but..) an average ASCII value of about 100 then 50 characters should have a sum near 50*100 = 5000 and fit into 4 characters. BUT I would probably push that upe to 6 or so for a total length of 10+1+6 = 17. Or ignore that and let CATX set the length to possibly some much larger value.

me55
Quartz | Level 8

i need it to cut off at 30.  i removed the data lines...i assumed that was not needed but i am getting another error.  so the table i am trying to update in is table_a so...

 

data table_a;
infile table_a;
informat id $6. name $30.
input id name;
run;

data table_a;
    set table_a;
    do i=1 to length(name);    

      count=sum(count,rank(substr(name,i,1)));
    end;
    length name $ 12;
    a_name=catx('-',id,count);
    drop iaap count;
run;

 

basically i want to update a field in the same table.  so i put want=have in your example and it seems to be having an issue there...

Tom
Super User Tom
Super User

Your first data step does not make much sense.  If do not have a dataset yet then ask a new question about how to create a dataset.

 

So let's just assume that you already have a dataset named TABLE_A with two variables named ID and NAME and want to create a new variable named A_NAME.  If you already have a variable named A_NAME then remove the LENGTH statement.  

 

Also let's make the new dataset have a different name than the input dataset so that when it fails we haven't lost the input dataset.

 

data new_table_a;
    set table_a;
    do i=1 to min(length(name),30);    
      count=sum(count,rank(char(name,i)));
    end;
    length a_name $12;
    a_name=catx('-',id,count);
    drop i count;
run;

 

ballardw
Super User

This structure:

 

Data existingdataset;

    set existingdataset;

 

Does not "update" it completely overwrites the existing data set and hence is a dangerous construct.

 

Start with sending output to a new data set. If you only want to test something you can restrict the number of observations processed by using the OBS option.

 

Do you a FILENAME statement referencing you external file as table_a? That is what your infile would expect.

 

You say "so i put want=have in your example and it seems to be having an issue there..." What exactly is the issue? I expect at least

a warning along the lines of

   WARNING: Length of character variable name has already been set.

         Use the LENGTH statement as the very first statement in the DATA STEP to declare the

         length of a character variable.

because you have: Length name $ 12; instead of Length A_name.

 

Note that

do i= 1 to length(name);

should change to

do i= 1 to  min(30,length(name));

to limit to the first 30 characters.

me55
Quartz | Level 8

having a bit of a day here...okay so i need this data back in the original table.  this is the last step before i build a final report and it is going to use most of the data in the table...

 

also, keep in mind i am very new to sas.  while i have deep knowledge of sql, the data step stuff, i am learning as i am going here.  personally i would rather do it in proc sql but i do need to learn sas and data steps so...i am still trying to fully understand these data steps but i do need the data back in the original table.  the only other way is i need all of the data moved from the current table to a new table with the new data. 

 

Tom
Super User Tom
Super User

Database system have their own view of how to process things. Normally because they are geared around gathering and storing data.  Usually making a new table (dataset) or database (library) is complex, time comsuming and left to specialist.  So in that type of system it might make sense to think in terms on updating variables (fields, columns) in an existing dataset (table, object).  

 

SAS is more designed for manipulating and reporting on data.  So you take your inputs manipulate them to calculate the values you need and produce your report.  You can save datasets (tables) easily and there is not much advantage to be gained from have pre-built datasets that you update in place.

 

In SAS if you have a current program that makes a report from table XXX it is trivial to update the program to report from table YYY instead (as long as it has the data you need).  You can even use macro variables so that the actual code doesn't change just the value of the macro variable that contains the dataset name that the program will use.

LinusH
Tourmaline | Level 20
What do you mean by sum them, it's a char variable.
And record no, is that an exact no or an aggregation as well.
Perhaps a "have" data set would clarify.
Data never sleeps
me55
Quartz | Level 8

id is an auto-numbered field that simply counts up to the last line in the table and that is record number...and as i said, it needs to find the ASCII VALUE of the characters and sum them. 

Tom
Super User Tom
Super User

The result depends on whether you include the spaces at the end of the names. Looks like you do not.

data have;
  length id 8 name $50 want $40 ;
  input id name want ;
cards;
101  abcd 101-394
102  efgh 102-410
103  ijkl 103-426
104  ThisNameHasMoreThanThirtyCharactersInIt ?
;

data want ;
  set have ;
  do _n_=1 to min(30,length(name));
    sum_ascii=sum(sum_ascii,rank(char(name,_n_)));
  end;
  do _n_=1 to 30;
    sum_ascii2=sum(sum_ascii2,rank(char(name,_n_)));
  end;
  do _n_=1 to length(name);
    sum_ascii3=sum(sum_ascii3,rank(char(name,_n_)));
  end;

  a_name = catx('-',id,put(sum_ascii,32.));
run;
proc print; run;

Capture.PNG

Peter_C
Rhodochrosite | Level 12
although proc sql could mimic the design that might be familiar in database management systems ( ALTER table, adding new column A_NAME, then UPDATE deriving the A_NAME values). This is not the most efficient.
Better would be to test your derivation of A_NAME on small subsets of the data (where creating a new data set is quick and cheap) using the ideas recommended by others.
Once a suitable derivation is discovered, I would recommend two implementations: Create a VIEW rather than table, to be used if the value of A_NAME is seldom required; For the other implementation - get the A_NAME column derived when the original data are loaded.

Hope this helps you find the solution that works best for you.
peterC

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 14 replies
  • 1681 views
  • 0 likes
  • 6 in conversation