BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
ilikesas
Barite | Level 11

Hi,

 

suppose I have a table with a somewhat complex data structure like the following:

 

n line
34 text jdjdjf stock IBM
35 djdff 
36 25$

 

Whenever I have a line with the word "stock" in it, the actual price is always 2 lines below. So what I would like to have is the following structure:

n line price
34 text jdjdjf stock IBM 25$

 

Like this the stock and its price are on the same observations and its possible to make an analysis.

 

Thank you.

1 ACCEPTED SOLUTION

Accepted Solutions
novinosrin
Tourmaline | Level 20

Yes, your understanding is correct. Good.

Ok, still just a  minor change

 

data want;
set have2 ;
retain _flag 0 _line _n ;
if find(line, "stock")>0 then do;
__n=0;
_flag=1;
_n=n;
_line=line;
end;
if _flag then __n+1;
if __n=3 then do;
n=_n;
price=line;
line=_line;
output;
end;
drop _: ;
run;

View solution in original post

15 REPLIES 15
PeterClemmensen
Tourmaline | Level 20

1) Why on earth do you have data structure like this? 🙂

 

2) Where does "A" come from in your line variable instead of IBM?

mkeintz
PROC Star

Use the @ character feature of the INPUT statement:

 

data want;
  infile datalines;
  input @ 'stock'  stock :$5. // price :$5.;
  put (_all_) (=);
datalines;
xxx
stock AAA
aaa
11$
xxx
xxx
stock BBBB
bbb
22$
run;

The slashes in the input statement tells it to advance one  line per slash.

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
ilikesas
Barite | Level 11

Hi mkeintz,

 

since my data is generated externally, lets assume that I already imported data equivalent to:

data have;
input line $ 1-10;  
datalines;
xxx
stock AAA
aaa
11$
xxx
xxx
stock BBBB
bbb
22$
run;

It is now that I need to do the process described in the question. I tried to guess based on your code by doing the following:

 

data want;
set have;
input @ 'stock' stock :$5. // price :$5.;
put (_all_) (=);

run;

 

But unsurprisingly it didn't work

 

mkeintz
PROC Star

Input is for raw data input, not sas data input.   I'd suggest usings @PeterClemmensen's solultion.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
PeterClemmensen
Tourmaline | Level 20
data have;
input n line $50.;
datalines;
34	text jdjdjf stock IBM
35	djdff 
36	25$
37	text jdjdjf stock IBM
38	djdff 
39	25$
40	text jdjdjf stock IBM
41	djdff 
42	25$
;

data want;
   merge have have(firstobs=3 rename=(line=price));
   if find(line, "stock")>0 then output;
run;
ilikesas
Barite | Level 11

Hi draycut,

 

thanks for the code, worked perfectly!

 

If possible to expand the code, in my case I have more variables of interest. For example, the 4th line after the line that contains the word "stock" I have its industry:

 

data have2;
input n line $50.;
datalines;
30  doidn
31  fifeniofe
32  doewiij
33  fofugrugr
34	text jdjdjf stock IBM
35	djdff 
36	25$
37  fg fgg
38	computers
39	text jdjdjf stock AAA
40	djdff 
41	22$
42  fggh
43	food
44	text jdjdjf stock BBB
45	djdff 
46	55$
47  duehuefhu
48	automotive
49  dfeoufeijer
50  ierhde43848
;
run;

Using your code I did the following:

 

data want2;
   merge have2 have2(firstobs=3 rename=(line=price)) have2(firstobs=5 rename=(line=industry));
   if find(line, "stock")>0 then output;
run;

And also got the result that I wanted!

 

I would like to ask for your opinion on the following:

 

1) In my data I have tens to hundreds of thousands of lines (most of them useless) and 5 to 7 variables that I need to extract, so this would mean merging 5 to 7 times the same data table just like above. Is this a good approach, i.e, could there be efficiency or some other problems (like the computer freezing for example)?

 

2) Recently I worked on some data where I also had to restructure it into a more analysis-friendly form using Excel VBA. Looking at my data I would perhaps do the following:

 

Loop through my original data until I find the word "stock". Once I find it, remember the cells row r and paste the cell's content into

a new spreadsheet in cell (1,2) (the first column in the new sheet is the stock, second column price etc).

In the original sheet the contents of the cell with row = r+2 are the stock's price and so they will be pasted into the new sheet's cell (2,2), and so on for the other variables...

And when I reach a new cell with the word "stock" I go to the next line in the new sheet.

It seems, at least to me, that using cells is easier conceptually (because I couldn't figure out how to do the above mentioned data restructuring using SAS), so is it possible to "cellulize" data in SAS? Is it even worth it?

 

Thanks 

 

 

 

 

 

 

 

 

 

 

novinosrin
Tourmaline | Level 20

@ilikesas  Good afternoon, IMO unless your sample pattern resembles exactly your real, I'd think the merge might not suffice. You prolly will need a sound method at some point to make it squeaky clean to handle varying patterns. 

novinosrin
Tourmaline | Level 20

data have2;

input n line $50.;

datalines;

30  doidn

31  fifeniofe

32  doewiij

33  fofugrugr

34    text jdjdjf stock IBM

35    djdff

36    25$

37  fg fgg

38    computers

39    text jdjdjf stock AAA

40    djdff

41    22$

42  fggh

43    food

44    text jdjdjf stock BBB

45    djdff

46    55$

47  duehuefhu

48    automotive

49  dfeoufeijer

50  ierhde43848

;

run;

 

data _null_;

if _N_ = 1 then do;

        if 0 then set have2;

        length price $10;

      declare hash h( );

      h.defineKey('_n','_line');

      h.defineData('_n','_line','price');

      h.defineDone();

        call missing(price);

   end;

do until(last);

      set have2 end=last;

      if find(line, "stock")>0 then do;

                  _n=n;

                  _line=line;

                  h.add();

            end;

      else if index(line,'$')>0 and  h.check()=0 then do;

            price=line;

            h.replace();

      end;

end;

if last then h.output(dataset:'want');

run;

Reeza
Super User

You've been thinking in Excel terms, not SAS terms. 

 

You can use RETAIN here to get what you want as well. I'm assuming you're scraping data from either a webpage or copy/pasted out of a report/PDF. There's often ways to move that data to get it in a better format in the first place but I'll consider that irrelevant for the moment. 

 

If you use RETAIN, you can trigger the write to your SAS data set once you find a new stock symbol. 

 

I don't see how you know which is a new symbol yet, but it seems relatively straightforward to me. 

 

data want; 
set have;
retain stock price <other variables>;

if <stock> then do; 
output;  *output values currently stored;
call missing(stock, price, <other variables>; *reset all values to missing to prevent carry over of values;
stock=value;   *set to new stock ticker;
end;

else if ... then price=value;
else if ... then <other_variables>=value; *since these are all retained, you get a single record for each 'stock' at the end;



run;

 

 

novinosrin
Tourmaline | Level 20

Reeza is right, yeah a simple retain does it like-

 


data want;
set have2 end=last;
retain _line _n;
if find(line, "stock")>0 then do;
_n=n;
_line=line;
end;
else if index(line,'$')>0 then do;
n=_n;
price=line;
line=_line;
output;
end;
drop _:;
run;

ilikesas
Barite | Level 11

Hi novinosrin,

 

for the line in your code: 

 

else if index(line,'$')>0 then do;

 

if I understand correctly the code looks for the symbol "$", but in my case I have many useless lines that contain the symbol "$", so it is really the relative position of the lines from the line that contains the word "stock" that are important

novinosrin
Tourmaline | Level 20

Yes, your understanding is correct. Good.

Ok, still just a  minor change

 

data want;
set have2 ;
retain _flag 0 _line _n ;
if find(line, "stock")>0 then do;
__n=0;
_flag=1;
_n=n;
_line=line;
end;
if _flag then __n+1;
if __n=3 then do;
n=_n;
price=line;
line=_line;
output;
end;
drop _: ;
run;

ilikesas
Barite | Level 11

Thanks for the code.

I managed to include an additional "industry" variable:

 

data want;
set have2 ;
retain _flag 0 _line _n _line2 _n2 ;
if find(line, "stock")>0 then do;
__n=0;
_flag=1;
_n=n;
_line=line;
end;
if _flag then __n+1;
if __n=3 then do;
_n2=n;
_line2=line;
end;
if __n=5 then do;
n=_n;
industry=line;
price=_line2;
line=_line;
output;
end;
drop _: ;
run;

so I guess that if I had more variables to create I would just repeat the process

novinosrin
Tourmaline | Level 20

@ilikesas Sorry mate!, I have been busy with school. Did it work? Are you able to progress and make the necessary changes?

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 15 replies
  • 1181 views
  • 8 likes
  • 5 in conversation