BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
viuf
Calcite | Level 5

Hi

I'm new to Data Steps because I have always been using Proc Sql instead. This time I just need to look into it and therefore I hope that someone in here are able to help Smiley Happy

I need to derive words and the first value from a text string / variable in a table. The string contains data that looks something like this:

Large text in variable (TXKLIGK):

Housing          300.000     310.000     320.000

Car                  50.000        51.000      52.000

etc. (100 other posts)

Every line is 80 characters and the start of the words are always identical and so is the end of every value. In this Case I need "Housing" and "300.000".

I have made some code that through substrings and so on can "cut out" the desired values, so that is ok. My problem is that I need a way to make a list of words (like Housing) to look up within the text string and then loop throuh these words and make a new column for each word, where the header is the "word" I look for and the value is the number:

New table:

Name          Housing          Car          etc...

Thomas       300.000     50.000          etc...

I have tried to loop through the array "Values" så I find all the values, but at the same time I need the variable "Value" to change name according to the value in the array, so I can identify the value, but it doesn't seem to work if I e.g. writes value&i = substr(sub, position_var, length); or value&Values(i) = substr(sub, position_var, length);

 

data test1

data test1;

      set Main;

      array Values {4} $30.

      (

      "Housing"

      "Car"

      "Power"

      "Tv"

      ) ;  

      do i=1 to 4;

      Values_string =''||"/"||Values(i)||"/"||'';

            position=prxmatch(Values_string, TXLIKG);

                  if position ^= 0 then

                  do;

                  sub = substr(TXLIKG, position, 45);

                  pattern= "([0-9\.\,]+)";

                  patternID=prxparse(pattern);

                  call prxsubstr(patternID, sub, position_var, length);

                        if position_var ^= 0 then

                        do;

                        value = substr(sub, position_var, length);

                        end;

                  end;

      end;

run;

I do hope it makes sense and thank you all in advance for any help.

Regards

Thomas

1 ACCEPTED SOLUTION

Accepted Solutions
art297
Opal | Level 21

I would take a much simpler approach to initially reading the data.  How about something like:

data want (keep=name val:);

  set dataset;

  format by_line $80.;

  format name $34.;

  do i=0 to int(length(TXLIKG)/80);

    by_line=substr(TXLIKG,i*80+1,80);

    if strip(by_line) ne "" then do;

      name=strip(substr(by_line,1,34));

      val1=input(substr(by_line,35,11),12.);

      val2=input(substr(by_line,46,11),12.);

      val3=input(substr(by_line,57,11),12.);

      val4=input(substr(by_line,68,11),12.);

      output;

    end;

  end;

run;

View solution in original post

9 REPLIES 9
Jagadishkatam
Amethyst | Level 16

Hi Thomas,

I assume you have not tried proc transpose. i tried it and believe that it is producing the same output what you desire. please check the below code and let me know if it is creating the desired output.

data have;

    input name $ val1 val2 val3;

    format val1 val2 val3 8.3;

cards;

Housing  300.000     310.000     320.000

Car      50.000      51.000      52.000

;

proc transpose data=have name=name out=want;

    id name;

run;

Thanks,

Jagadish

Thanks,
Jag
viuf
Calcite | Level 5

Thanks for your answer...

I think that proc transpose will be the one to use when I have alle the variables. My problem right now is that I don't have the variables yet. They are all "trapped" within the same big text field . I can extract one value (housing) with the related (value) , but I can't loop through the list

      array Values {4} $30.

      (

      "Housing"

      "Car"

      "Power"

      "Tv"

      ) ;  

and extract them all because I can't change the name of the variable to store them in. It only saves the last one, the rest are overwritten.... therefore I would like to know if I within a data step can name variables dynamicly e.g. like value&i = substr(sub, position_var, length);

Thanks Smiley Happy

Jagadishkatam
Amethyst | Level 16

Thank you Thomas for your reply. As per your email i got an idea of what your require but did not get it completely.

Could you please send 10 lines of the raw data on which you want to work. This will help me to understand the query and try to provide a better solution.

Thanks for your understanding.

Regards,

Jagadish

Thanks,
Jag
viuf
Calcite | Level 5

Ok I'll try Smiley Happy I have attatched a sas data set in a very simplyfied version with only one row. The code below matches the dataset.

What I need from the field TXLIKG in this one row is to look up the values in the array "Values". The values are:

"Indtægter", "Bolig" and "Kabel TV". I need to have the text returned and also the value of the first number after the text. In this case it should be:

Indtægter =         819.772

Bolig        =        317.914

Kabel TV  =          10.800  

The code already returns the right value but only for the last value in the Array because it overwrites the other. I need to be able to rename the variable "Value" on the fly so I can have an unlimited number of variables in the array to look up.

** string + nrows ;

data test;

      set Dataset;

      array Values {3} $30.

      (

      "Indtægter"

      "Bolig"

      "Kabel TV"

      ) ;  

      do i=1 to 3;

      Values_string =''||"/"||Values(i)||"/"||'';

            position=prxmatch(Values_string, TXLIKG);

                  if position ^= 0 then

                  do;

                  sub = substr(TXLIKG, position, 45);

                  pattern= "([0-9\.\,]+)";

                  patternID=prxparse(pattern);

                  call prxsubstr(patternID, sub, position_var, length);

                        if position_var ^= 0 then

                        do;

                        value = substr(sub, position_var, length);

                        end;

                  end;

                 

      end;

run;

art297
Opal | Level 21

I would take a much simpler approach to initially reading the data.  How about something like:

data want (keep=name val:);

  set dataset;

  format by_line $80.;

  format name $34.;

  do i=0 to int(length(TXLIKG)/80);

    by_line=substr(TXLIKG,i*80+1,80);

    if strip(by_line) ne "" then do;

      name=strip(substr(by_line,1,34));

      val1=input(substr(by_line,35,11),12.);

      val2=input(substr(by_line,46,11),12.);

      val3=input(substr(by_line,57,11),12.);

      val4=input(substr(by_line,68,11),12.);

      output;

    end;

  end;

run;

Astounding
PROC Star

viuf,

Assuming that you can read in the data and break each line up into separate variables, the proposed PROC TRANSPOSE does the job admirable.  You might end up having to sort and run the transpose BY NAME, but the real power here is the ID statement.  It will figure out what variable names you need, and create them for you automatically.  So the ID statement refers to the first variable, the one that takes on values like Housing, Car, etc.  Name (such as Thomas) would be a separate variable that would come from some other portion of your long text string.

Good luck.

viuf
Calcite | Level 5

Hi all

It seems like Arthurs solution actually was very simply and excactly what I was looking for - Thank you!

Now I just need to do the transpose, but unfortunately I can see that my next problem will be a duplicate-issue. In the full dataset I will propably experience duplicates which will ruin the transpose. Is there a way to rename duplicates automaticly while (or just before) running the proc transpose?

Preferably the rename should be something like:

Table to be transposed:

Name               val1

Housing          20000

Interestrate     10000

xxx                   yyy

Car               10000
Interestrate     2000

Result (based on group values):

Housing     Housing-Interestrate      xxx     Car          Car-Interestrate

20000          10000                         yyy     10000     2000

or maybe (based on number):

Housing     Interestrate1      xxx     Car          Interestrate2

20000          10000                         yyy     10000     2000

Thanks again Smiley Happy

art297
Opal | Level 21

: I would post the transpose question as a new thread, including an example of your now current dataset, and the results that you hope to achieve.

viuf
Calcite | Level 5

Your right, it makes sense. I'll do that...

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 9 replies
  • 1848 views
  • 6 likes
  • 4 in conversation