DATA Step, Macro, Functions and more

Array with unknown number of variables

Accepted Solution Solved
Reply
Frequent Contributor
Posts: 92
Accepted Solution

Array with unknown number of variables

Hello everyone,

I'm trying to download some customer number data from a company intranet site.  Attempting to adapt some code that Tom gave me (thanks again!) which downloads data from the internet using an input statement.  Code works fine when I specify the number of customer numbers.  Unfortunately, the number of customers is unknown.  There may be 10 or 100. The number varies based on company region and other characteristics.  Note I tried using the DIM(var) syntax to obtain the dimensions of the array, but I get the following error message:

ERROR: The array CUST_NUM has been defined with zero elements.

ERROR: Too many array subscripts specified for array CUST_NUM.

Below is the code I'm using.  Does anyone have any suggestions on how to define the array such it'll be able to handle varying numbers of customers?

Thanks so much for any help or advice.

-Bill

Again when I run this code I get the above error message.

%MACRO CUSTOMER_NUMBER(HTML_TAG,SAS_NAME);

data &SAS_NAME.;

infile in flowover ;

input @&HTML_TAG. @;  /*START*/

array &SAS_NAME. (*) $8.;

do col=1 to dim(&SAS_NAME.);

   input @ '/q?s='  &SAS_NAME.(col) $8. @;

end;

drop col;

run;

%MEND;

%CUSTOMER_NUMBER('>Preferred Customers<',CUST_NUM);


Accepted Solutions
Solution
‎09-20-2013 07:27 AM
Respected Advisor
Posts: 3,799

Re: Array with unknown number of variables

Posted in reply to BillJones

Why not just read the names as observations instead of into and array.  You don't know how many you need.  You also needed larger LRECL.  When doing the "blind scan" you can either live with LOSTCARD or use INFILE statement option EOF instead of END.

%MACRO INSDIER(HTML_TAG,SAS_NAME);


filename in url
"http://finance.yahoo.com/q/it?s=LNKD+Insider+Transactions" lrecl=1000000;

data test;
   infile in length=len truncover;
   input @;
   recno +
1;
   reclen = len;
   do block=
1 by 1;
      input record
$char128. @;
      if missing(record) then leave;
      output;
      end;
   run;



%LET NCOL=50; /*set the number of columns to an arbitrarily large number to ensure all insider info is recorded*/

data &
SAS_NAME.;
   infile in scanover eof=eof;
   input @&
HTML_TAG. @;  /*START*/
   length &sas_name $
25;
   do i =
1 by 1;
      input @
'http://biz.yahoo.com' @'>' string $40. @;
      &SAS_NAME=scan(string,1,'<'); /*SCAN STRING AND KEEP UP TO <*/
      output;
      end;
   eof: stop;
   drop string;
   run;
%MEND;
options mprint=1;
%
INSDIER('>Insider Transactions Reported - Last Two Years<',INSIDER);

View solution in original post


All Replies
Super User
Super User
Posts: 7,070

Re: Array with unknown number of variables

Posted in reply to BillJones

You can only use (*) as the dimension for an array definition when the variables in question already exist.

You could try adding a third macro parameter to specify the number of expected values.

%MACRO CUSTOMER_NUMBER(HTML_TAG,SAS_NAME,NCOL);

...

array &SAS_NAME. (&NCOL) $8.;

If you cannot know in advance how many copies there will be then you probably need to add conditional to the DO loop to prevent reading past the end of the file.

infile in flowover end=eof ;

...

do col=1 to dim(&NCOL) while (not eof);


Frequent Contributor
Posts: 92

Re: Array with unknown number of variables

Tom,

Thanks so much for your thoughts.  I tried to implement your solution by specifying an arbitrarily large number of customers and then using the conditional DO loop to prevent reading past the end of the file.  I can see the data in the log!  However, I get a "LOST CARD" error and SAS doesn't output any records to the data set.  I tried using the MISSOVER command in the infile statement but that set all the data to blank.  Am I correctly using the conditional DO loop as you suggested?  Should it be a loop within a loop?

Here's the revised code.  Note it works fine when I specify NCOL=10.  When I put NCOL=100, I get a "LOST CARD" error and no data records are output.

%LET NCOL=100;

%MACRO CUSTOMER_NUMBER(HTML_TAG,SAS_NAME);

data &SAS_NAME.;

infile in flowover end=eof ;

input @&HTML_TAG. @;  /*START*/

array &SAS_NAME. (&NCOL.) $8.;

do col=1 to dim(&SAS_NAME.) while (not eof);

   input @ '/q?s='  &SAS_NAME.(col) $8. @;

end;

drop col;

run;

%MEND;

%CUSTOMER_NUMBER('>Preferred Customers<',CUST_NUM);

Respected Advisor
Posts: 3,799

Re: Array with unknown number of variables

Posted in reply to BillJones

Can you show example of the site you are reading from.

Frequent Contributor
Posts: 92

Re: Array with unknown number of variables

Posted in reply to data_null__

data_null_;,

Thanks for the response.  Unfortunately, I can't give an example of the website since it's a company intranet site.  However, the code I'm using is almost working.  I just need to figure out how to generalize it, so it'll handle an unknown number of customers. 

Regards,

Bill

Super Contributor
Posts: 297

Re: Array with unknown number of variables

Posted in reply to BillJones

Substitute the data contained within the website with something generic.  It is difficult to help you if you can't provide a basic structure of the data you are reading.

Frequent Contributor
Posts: 92

Re: Array with unknown number of variables

Posted in reply to Scott_Mitchell

My apologies for 1) not having an example and 2) taking a long time to respond (work has been crazy this week).  Here's some code that pulls the names of LinkedIn insiders who've traded the stock in the last 2 years from yahoo finance.  The structure of the table on yahoo is very similar to the table on the intranet page that I'm working with. Also, both programs get the exact same error when I set the NCOL (number of columns) variable to something large, e.g., 50.  (SAS comes back with a "LOST CARD" error and states "SAS went to a new line when INPUT @'CHARACTER_STRING' scanned past the end of a line."  Note the program runs fine when I set the NCOL = 10.  Note I'm setting the number of columns to an arbitrarily large number because I don't know how many insiders traded the stock during the past 2 years.  Is there any easy way to count them before hand and then specify the number of columns correctly?  Or is it advisable to simply pick a large number and then delete any records with missing data?


Thanks so much for any help or advice.

Regards,

Bill



Code:

DM 'CLEAR LOG';

OPTIONS SPOOL;

%MACRO INSDIER(HTML_TAG,SAS_NAME);

filename in url "http://finance.yahoo.com/q/it?s=LNKD+Insider+Transactions" lrecl=32000;

data test;

infile in length=len;

input record $varying32000. len;

run;

%LET NCOL=50; /*set the number of columns to an arbitrarily large number to ensure all insider info is recorded*/

data &SAS_NAME.;

infile in flowover end=eof;

input @&HTML_TAG. @;  /*START*/

array &SAS_NAME. (&NCOL.) $25.;

do col=1 to dim(&SAS_NAME.) while (not eof);

    input @ '8557.html">'  string $25. @; /*GRAB 25 CHARACTERS AFTER TAG*/

  &SAS_NAME.(col)=input(scan(string,1,'<'),??$25.); /*SCAN STRING AND KEEP UP TO <*/

end;

drop col string;

run;

%MEND;

%INSDIER('>Insider Transactions Reported - Last Two Years<',INSIDER);

Solution
‎09-20-2013 07:27 AM
Respected Advisor
Posts: 3,799

Re: Array with unknown number of variables

Posted in reply to BillJones

Why not just read the names as observations instead of into and array.  You don't know how many you need.  You also needed larger LRECL.  When doing the "blind scan" you can either live with LOSTCARD or use INFILE statement option EOF instead of END.

%MACRO INSDIER(HTML_TAG,SAS_NAME);


filename in url
"http://finance.yahoo.com/q/it?s=LNKD+Insider+Transactions" lrecl=1000000;

data test;
   infile in length=len truncover;
   input @;
   recno +
1;
   reclen = len;
   do block=
1 by 1;
      input record
$char128. @;
      if missing(record) then leave;
      output;
      end;
   run;



%LET NCOL=50; /*set the number of columns to an arbitrarily large number to ensure all insider info is recorded*/

data &
SAS_NAME.;
   infile in scanover eof=eof;
   input @&
HTML_TAG. @;  /*START*/
   length &sas_name $
25;
   do i =
1 by 1;
      input @
'http://biz.yahoo.com' @'>' string $40. @;
      &SAS_NAME=scan(string,1,'<'); /*SCAN STRING AND KEEP UP TO <*/
      output;
      end;
   eof: stop;
   drop string;
   run;
%MEND;
options mprint=1;
%
INSDIER('>Insider Transactions Reported - Last Two Years<',INSIDER);
Frequent Contributor
Posts: 92

Re: Array with unknown number of variables

Posted in reply to data_null__

data_null_;,

Thanks so much for your help. Program works perfectly now!

Best,

Bill

Frequent Contributor
Posts: 92

Re: Array with unknown number of variables

Posted in reply to data_null__

data_null_;

Thanks again for the code. Quick question: Currently, I'm using a macro to create separate data sets for each variable of interest and then merging the information.  Is there a way to easily modify the code such that it can capture more than one variable in a data step?  Using the insider information example, suppose I wanted to capture the insider type(director, officer) along with the insider name (html tags in code below).  Is this possible to do in one data step?  If this is a pain, then please don't worry about it.  The code you gave me works really well.  I'm just trying to make the program as efficient as possible.  Thank you!

-Bill

Current code uses macro to create separate data sets for each variable of interest (e.g., INSIDER_NAME, INSIDER_TYPE):

DM 'CLEAR LOG';

OPTIONS SPOOL;

filename in url "http://finance.yahoo.com/q/it?s=LNKD+Insider+Transactions" lrecl=1000000;

data test;

   infile in length=len truncover;

   input @;

   recno + 1;

   reclen = len;

   do block=1 by 1;

      input record $char128. @;

      if missing(record) then leave;

      output;

      end;

   run;

%LET MAIN_HTML_TAG='>Insider Transactions Reported - Last Two Years<';

%MACRO INSIDER_VARS(SAS_NAME,VAR_HTML_TAG,FORMAT);

data &SAS_NAME.;

   infile in scanover eof=eof;

   input @&MAIN_HTML_TAG. @;  /*START*/

   length &sas_name $25;

   do i = 1 by 1;

    INPUT @ &VAR_HTML_TAG. @'>' STRING &FORMAT. @;

    &SAS_NAME=INPUT(SCAN(STRING,1,'<'),??&FORMAT.);

    output;

    end;

   eof: stop;

   drop string;

run;

%MEND;

options mprint=1;

%INSIDER_VARS(INSIDER_NAME,'http://biz.yahoo.com',$40.);

%INSIDER_VARS(INSIDER_TYPE,'<span style="font-size:90%; display:block;"',$40.);

Respected Advisor
Posts: 3,799

Re: Array with unknown number of variables

Posted in reply to BillJones

Yes can read everything at once, just more of the same so to speak.

data insidertrans;
   infile in scanover eof=eof;
   input @'>Insider Transactions Reported - Last Two Years<' @;  /*START*/
  
length name title $25 shares 8 type $10 transaction $40;
  
do i = 1 by 1;
     
input @'http://biz.yahoo.com' @'>' string $40. +(-40) @;
      NAME=scan(string,1,'<'); /*SCAN STRING AND KEEP UP TO <*/
     
input @'<span style="font-size:90%; display:block;"' @'>' string $40. +(-40) @;
      title=scan(string,1,'<');
      input @'class="yfnc_tabledata1"' @'>' string $40. +(-40) @;
      shares=input(scan(string,1,'<'),comma12.);
      input @'class="yfnc_tabledata1"' @'>' string $40. +(-40) @;
      type=scan(string,1,'<');
      input @'class="yfnc_tabledata1"' @'>' string $40. +(-40) @;
      transaction=scan(string,1,'<');
      output;
     
end;
   eof:
stop;
  
*drop string;
  
run;
Frequent Contributor
Posts: 92

Re: Array with unknown number of variables

Posted in reply to data_null__

Thanks so much for the updated code!!!  It eliminates the need to merge the data in a subsequent data step.

Regards,

Bill

Respected Advisor
Posts: 3,799

Re: Array with unknown number of variables

Posted in reply to BillJones

I think there is one more variable I forget to include but you should be able to do that with no problem.

Frequent Contributor
Posts: 92

Re: Array with unknown number of variables

Posted in reply to data_null__

With the code you gave me, picking up the last variable should be no problem whatsoever.  Just a few more questions if you don't mind: What's the purpose of the +(-40) in the input statement?  Is it related to the variable with the greatest length?  In other words, if the variable with the greatest length was $60, then the code would be +(-60)?

Also, do you know of any good references for mastering the input statement?  I have a LOT to learn.

Thanks so much for your help!

Best,

Bill

Respected Advisor
Posts: 3,799

Re: Array with unknown number of variables

Posted in reply to BillJones

I used +(-40) to move backward the amount that was read by STRING $40.  I don't think this is necessary but it thought it might happen that reading in the STRING could move the pointer past the next "target" of the @''class="yfnc_tabledata1" bit.  If you read string $60. then your would move back +(-60), or $10.  +(-10)

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 14 replies
  • 2723 views
  • 3 likes
  • 4 in conversation