BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
DrewF
Calcite | Level 5

Using University edition SAS for a personal project. I'm hoping this is a simple mistake and not something that simply cannot be resolved. To reduce the problem to it's most basic core, what I need to be able to do is get the length of characters between each delimiter in an input file. The reason being is that I would like to generate an individual table for each variable. So there will be one table for the first variable, another for the second etc. The problem arises when I get beyond the first three variables because then the column start position can will be variable. So I don't know what particular column to set the @ condition to in my input statement. I need to use a variable to do this in the format @&numericvariableforlength.

 

The problem I'm having is how to pass the variable value into the macro rather than the variable reference. For instance, as you can see from the code below I get exactly the result I want if I actually specify the value as 19. However, this isn't a constant and might change - atm I'm trying to get this to work with just the first line of data so FIRSTOBS and OBS will also be variable and I'll come across the same issue there. So I need to know what I need to do in order to pass in the value rather than the reference. As I have it written right now the value being stored in sslength is actually slength, not 19. So slength as a string is passed in as the parameter rather than the numeric value 19. How do I pass in 19? I was having a lot of trouble with sslength not being initialised when I was calling it at the end of the code with my other macro calls so I tried embedding it within the macro itself but that didn't help.

 

Line of data I'm working with: 

 

AAA,19970102,2.43,2.45,2.42,2.43,62508

Code: 

 

LIBNAME ASX '/folders/myfolders';
%MACRO populate(data_name=, line_n=, start_pos=);
DATA ASX.&data_name;
INFILE '/folders/myfolders/ASXData/19970102.TXT' DSD FIRSTOBS = &line_n OBS = &line_n;
INPUT @&start_pos &data_name :$;
%put &start_pos;
RUN;

%MEND;
%MACRO getlength(col=, line_n=);
DATA _NULL_;
INFILE '/folders/myfolders/ASXData/19970102.TXT' FIRSTOBS = &line_n OBS = &line_n;
INPUT Raw :$ 1-50;
CALL SCAN(Raw, &col, position, slength, ',');
%LET sslength = slength;
*put sslength=slength;
%populate(data_name=High, line_n=1, start_pos=sslength);
/*This works:
%LET sslength = 19;
%populate(data_name=High, line_n=1, start_pos=&sslength);
*/
RUN;
%MEND;
%populate(data_name=stock, line_n=1, start_pos=1);
%populate(data_name=date, line_n=1, start_pos=5);
%getlength(col=3, line_n=1);

Note: Edited to use text boxes. 

 

SOLUTION: I modified the following sections to get this to work. The solution was using symput as 

 

 

%MACRO getlength(col=, line_n=);

	DATA _NULL_;	
		%GLOBAL var_length;
		INFILE '/folders/myfolders/ASXData/19970102.TXT' FIRSTOBS = &line_n OBS = &line_n;
		INPUT Raw :$ 1-50;
		CALL SCAN(Raw, &col, position, input_length, ',');
		input_length = position + input_length + 1;
		CALL SYMPUT('var_length', input_length);
		%put var_length=&var_length;
	RUN;

%MEND;

....

 

 

 

%populate(data_name=High, line_n=1, start_pos=&var_length);

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Kurt_Bremser
Super User
Since macro statements are resolved while the code is fetched for compilation and execution, they have no access to datastep variable values. Use the call symput() subroutine to save datastep values to macro variables.

View solution in original post

13 REPLIES 13
LinusH
Tourmaline | Level 20
CALL EXECUTE.
Data never sleeps
DrewF
Calcite | Level 5

In what sense would that replace the code I have written and have you tested this solution in the context of my problem to see if it works?

Kurt_Bremser
Super User
Since macro statements are resolved while the code is fetched for compilation and execution, they have no access to datastep variable values. Use the call symput() subroutine to save datastep values to macro variables.
DrewF
Calcite | Level 5
hmmm, so the issue is timing. Thank you, that helps. I have tried symput in solving this same issue but not within a macro, so I think the combination of the two will hopefully work. I will try tomorrow and if I get a solution will post and credit.
DrewF
Calcite | Level 5
I tried that but it had absolutely no effect. It resulted in unresolved variables, which is why I tried moving the problematic macro call within the macro where SCAN is used.
Astounding
PROC Star

It's really difficult to understand why you need any of the complications.  SAS knows how to read a delimited line of data.  Why can't you simply use:

 

DATA ASX.&data_name;
INFILE '/folders/myfolders/ASXData/19970102.TXT' DSD;
INPUT name $  date  var1 var2 var3 var4 var5;
RUN;

 

If you really need to pick out column information as you read the data, the COL= option might help.  Experiment along these lines:

 

DATA ASX.&data_name;
INFILE '/folders/myfolders/ASXData/19970102.TXT' DSD COL=col_position;
INPUT name $ @ ;

name_end = col_position;

input date @ ;

date_end = col_position;

input var1 @ ;

var1_end = col_position;

 ... etc. ...
RUN;

 

*******************************

EDITED:  In this example, COL_POSITION gets created by the COL= option on the INFILE statement.  Note that it may be one or two larger than what you need.  What I called "NAME_END" actually may be the column where DATE begins.  That's why I mentioned the word "experiment".  But once you get the idea it probably provides the information you need.

DrewF
Calcite | Level 5

I'll have a look and see if this solution can determine length at run time.

How do I set the variable col_position though? That's the question I asked.
Reeza
Super User

Weird request, but ok. 

 

So I don't know what particular column to set the @ condition to in my input statement. I need to use a variable to do this in the format @&numericvariableforlength.

 

If there's not a specific column start you can't use this methodology. What you can do is read the data once and output each variable to it's own file, as long as you have a specified delimiter that clearly identfies your variable. Again, not sure how this is practical at all. 


Here's an example with a small hardcoded subset and you can automate it from that point - remember the start for a good macro solution, is first a non-macro solution.

 

data name (keep=name) age (keep=age) sex (keep=sex);
infile cards dlm=',';

input name $ age sex $;
cards;
"Alfred", 24, M
"Sandra", 25, F
"Fred", 32, F
;
run;

 

Tom
Super User Tom
Super User

What is the goal?  Is it to get the max length for each column in a delimited file?

Here is a program that will generate one row per column in your file with the max length for that column.  I included code to capture the first line as column headers.

data want ;
  array hdr (100) $32767  _temporary_;
  array len (100) _temporary_;
  infile csv dsd truncover length=l column=cc end=eof lrecl=1000000;
  length str $32767 ;
  do col=1 by 1 until(cc > l);
    input str @ ;
    if _n_=1 then hdr(col)=str;
    else len(col) = max(len(col),lengthn(str));
    lastcol=max(lastcol,col);
    retain lastcol;
  end;
  if eof then do col=1 to lastcol;
     maxlength=len(col);
     length header $32767 ;
     header=hdr(col);
    output;
  end;
  keep col maxlength header;
run;

So if I make a CSV file out SASHELP.CLASS then the result is:

Obs    col    maxlength    header

 1      1         7        Name
 2      2         1        Sex
 3      3         2        Age
 4      4         4        Height
 5      5         5        Weight
Tom
Super User Tom
Super User

You are confused about how macros works. They just generate text which is then interpreted by SAS as code.

 

So you have a macro, %POPULATE(), that generates a data step.

Then you are trying to call in it in the middle of another data step as if you could nest data steps inside of each other.

 

But if you try to nest two data steps in a SAS program, like this:

DATA _NULL_;
INFILE .... ;
...
DATA ASX.&data_name;
INFILE ... ;
...
run;
run;

SAS will just interpret that as two different data step. The first one will end when it sees the second one starting and the extra RUN statement at the end will do nothing.

DrewF
Calcite | Level 5

Thanks, that makes sense. I can see why embedding in this case won't work. 

 

 

Tom
Super User Tom
Super User

Your whole program does not make any sense.

 

Your approach of trying to say that variable 3 starts at a specific character in the source file does not make any sense for delimited file. The delimiters are what determine where the values are, not hard coded character positions like you see in a fixed format text file.

 

What is the goal of the whole project.

If you want to get a dataset that only has one of the variables from the source file it is probably best to first read the whole file first and then use a KEEP statement to keep the variables you want.  You really don't save anything by trying to only read some of the columns since you have to physically read the whole text file anyway.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 13 replies
  • 12448 views
  • 2 likes
  • 6 in conversation