Hello everyone,
I'm trying to download some customer number data from a company intranet site. Attempting to adapt some code that Tom gave me (thanks again!) which downloads data from the internet using an input statement. Code works fine when I specify the number of customer numbers. Unfortunately, the number of customers is unknown. There may be 10 or 100. The number varies based on company region and other characteristics. Note I tried using the DIM(var) syntax to obtain the dimensions of the array, but I get the following error message:
ERROR: The array CUST_NUM has been defined with zero elements.
ERROR: Too many array subscripts specified for array CUST_NUM.
Below is the code I'm using. Does anyone have any suggestions on how to define the array such it'll be able to handle varying numbers of customers?
Thanks so much for any help or advice.
-Bill
Again when I run this code I get the above error message.
%MACRO CUSTOMER_NUMBER(HTML_TAG,SAS_NAME);
data &SAS_NAME.;
infile in flowover ;
input @&HTML_TAG. @; /*START*/
array &SAS_NAME. (*) $8.;
do col=1 to dim(&SAS_NAME.);
input @ '/q?s=' &SAS_NAME.(col) $8. @;
end;
drop col;
run;
%MEND;
%CUSTOMER_NUMBER('>Preferred Customers<',CUST_NUM);
Why not just read the names as observations instead of into and array. You don't know how many you need. You also needed larger LRECL. When doing the "blind scan" you can either live with LOSTCARD or use INFILE statement option EOF instead of END.
You can only use (*) as the dimension for an array definition when the variables in question already exist.
You could try adding a third macro parameter to specify the number of expected values.
%MACRO CUSTOMER_NUMBER(HTML_TAG,SAS_NAME,NCOL);
...
array &SAS_NAME. (&NCOL) $8.;
If you cannot know in advance how many copies there will be then you probably need to add conditional to the DO loop to prevent reading past the end of the file.
infile in flowover end=eof ;
 ...
do col=1 to dim(&NCOL) while (not eof);
Tom,
Thanks so much for your thoughts. I tried to implement your solution by specifying an arbitrarily large number of customers and then using the conditional DO loop to prevent reading past the end of the file. I can see the data in the log! However, I get a "LOST CARD" error and SAS doesn't output any records to the data set. I tried using the MISSOVER command in the infile statement but that set all the data to blank. Am I correctly using the conditional DO loop as you suggested? Should it be a loop within a loop?
Here's the revised code. Note it works fine when I specify NCOL=10. When I put NCOL=100, I get a "LOST CARD" error and no data records are output.
%LET NCOL=100;
%MACRO CUSTOMER_NUMBER(HTML_TAG,SAS_NAME);
data &SAS_NAME.;
infile in flowover end=eof ;
input @&HTML_TAG. @; /*START*/
array &SAS_NAME. (&NCOL.) $8.;
do col=1 to dim(&SAS_NAME.) while (not eof);
input @ '/q?s=' &SAS_NAME.(col) $8. @;
end;
drop col;
run;
%MEND;
%CUSTOMER_NUMBER('>Preferred Customers<',CUST_NUM);
Can you show example of the site you are reading from.
data_null_;,
Thanks for the response. Unfortunately, I can't give an example of the website since it's a company intranet site. However, the code I'm using is almost working. I just need to figure out how to generalize it, so it'll handle an unknown number of customers.
Regards,
Bill
Substitute the data contained within the website with something generic. It is difficult to help you if you can't provide a basic structure of the data you are reading.
My apologies for 1) not having an example and 2) taking a long time to respond (work has been crazy this week). Here's some code that pulls the names of LinkedIn insiders who've traded the stock in the last 2 years from yahoo finance. The structure of the table on yahoo is very similar to the table on the intranet page that I'm working with. Also, both programs get the exact same error when I set the NCOL (number of columns) variable to something large, e.g., 50. (SAS comes back with a "LOST CARD" error and states "SAS went to a new line when INPUT @'CHARACTER_STRING' scanned past the end of a line." Note the program runs fine when I set the NCOL = 10. Note I'm setting the number of columns to an arbitrarily large number because I don't know how many insiders traded the stock during the past 2 years. Is there any easy way to count them before hand and then specify the number of columns correctly? Or is it advisable to simply pick a large number and then delete any records with missing data?
Thanks so much for any help or advice.
Regards,
Bill
Code:
DM 'CLEAR LOG';
OPTIONS SPOOL;
%MACRO INSDIER(HTML_TAG,SAS_NAME);
filename in url "http://finance.yahoo.com/q/it?s=LNKD+Insider+Transactions" lrecl=32000;
data test;
infile in length=len;
input record $varying32000. len;
run;
%LET NCOL=50; /*set the number of columns to an arbitrarily large number to ensure all insider info is recorded*/
data &SAS_NAME.;
infile in flowover end=eof;
input @&HTML_TAG. @; /*START*/
array &SAS_NAME. (&NCOL.) $25.;
do col=1 to dim(&SAS_NAME.) while (not eof);
input @ '8557.html">' string $25. @; /*GRAB 25 CHARACTERS AFTER TAG*/
&SAS_NAME.(col)=input(scan(string,1,'<'),??$25.); /*SCAN STRING AND KEEP UP TO <*/
end;
drop col string;
run;
%MEND;
%INSDIER('>Insider Transactions Reported - Last Two Years<',INSIDER);
Why not just read the names as observations instead of into and array. You don't know how many you need. You also needed larger LRECL. When doing the "blind scan" you can either live with LOSTCARD or use INFILE statement option EOF instead of END.
data_null_;,
Thanks so much for your help. Program works perfectly now!
Best,
Bill
data_null_;
Thanks again for the code. Quick question: Currently, I'm using a macro to create separate data sets for each variable of interest and then merging the information. Is there a way to easily modify the code such that it can capture more than one variable in a data step? Using the insider information example, suppose I wanted to capture the insider type(director, officer) along with the insider name (html tags in code below). Is this possible to do in one data step? If this is a pain, then please don't worry about it. The code you gave me works really well. I'm just trying to make the program as efficient as possible. Thank you!
-Bill
Current code uses macro to create separate data sets for each variable of interest (e.g., INSIDER_NAME, INSIDER_TYPE):
DM 'CLEAR LOG';
OPTIONS SPOOL;
filename in url "http://finance.yahoo.com/q/it?s=LNKD+Insider+Transactions" lrecl=1000000;
data test;
infile in length=len truncover;
input @;
recno + 1;
reclen = len;
do block=1 by 1;
input record $char128. @;
if missing(record) then leave;
output;
end;
run;
%LET MAIN_HTML_TAG='>Insider Transactions Reported - Last Two Years<';
%MACRO INSIDER_VARS(SAS_NAME,VAR_HTML_TAG,FORMAT);
data &SAS_NAME.;
infile in scanover eof=eof;
input @&MAIN_HTML_TAG. @; /*START*/
length &sas_name $25;
do i = 1 by 1;
INPUT @ &VAR_HTML_TAG. @'>' STRING &FORMAT. @;
&SAS_NAME=INPUT(SCAN(STRING,1,'<'),??&FORMAT.);
output;
end;
eof: stop;
drop string;
run;
%MEND;
options mprint=1;
%INSIDER_VARS(INSIDER_NAME,'http://biz.yahoo.com',$40.);
%INSIDER_VARS(INSIDER_TYPE,'<span style="font-size:90%; display:block;"',$40.);
Yes can read everything at once, just more of the same so to speak.
Thanks so much for the updated code!!! It eliminates the need to merge the data in a subsequent data step.
Regards,
Bill
I think there is one more variable I forget to include but you should be able to do that with no problem.
With the code you gave me, picking up the last variable should be no problem whatsoever. Just a few more questions if you don't mind: What's the purpose of the +(-40) in the input statement? Is it related to the variable with the greatest length? In other words, if the variable with the greatest length was $60, then the code would be +(-60)?
Also, do you know of any good references for mastering the input statement? I have a LOT to learn.
Thanks so much for your help!
Best,
Bill
I used +(-40) to move backward the amount that was read by STRING $40. I don't think this is necessary but it thought it might happen that reading in the STRING could move the pointer past the next "target" of the @''class="yfnc_tabledata1" bit. If you read string $60. then your would move back +(-60), or $10. +(-10)
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.
