11-01-2017 07:53 AM - edited 11-01-2017 06:30 PM
Using University edition SAS for a personal project. I'm hoping this is a simple mistake and not something that simply cannot be resolved. To reduce the problem to it's most basic core, what I need to be able to do is get the length of characters between each delimiter in an input file. The reason being is that I would like to generate an individual table for each variable. So there will be one table for the first variable, another for the second etc. The problem arises when I get beyond the first three variables because then the column start position can will be variable. So I don't know what particular column to set the @ condition to in my input statement. I need to use a variable to do this in the format @&numericvariableforlength.
The problem I'm having is how to pass the variable value into the macro rather than the variable reference. For instance, as you can see from the code below I get exactly the result I want if I actually specify the value as 19. However, this isn't a constant and might change - atm I'm trying to get this to work with just the first line of data so FIRSTOBS and OBS will also be variable and I'll come across the same issue there. So I need to know what I need to do in order to pass in the value rather than the reference. As I have it written right now the value being stored in sslength is actually slength, not 19. So slength as a string is passed in as the parameter rather than the numeric value 19. How do I pass in 19? I was having a lot of trouble with sslength not being initialised when I was calling it at the end of the code with my other macro calls so I tried embedding it within the macro itself but that didn't help.
Line of data I'm working with:
LIBNAME ASX '/folders/myfolders'; %MACRO populate(data_name=, line_n=, start_pos=); DATA ASX.&data_name; INFILE '/folders/myfolders/ASXData/19970102.TXT' DSD FIRSTOBS = &line_n OBS = &line_n; INPUT @&start_pos &data_name :$; %put &start_pos; RUN; %MEND; %MACRO getlength(col=, line_n=); DATA _NULL_; INFILE '/folders/myfolders/ASXData/19970102.TXT' FIRSTOBS = &line_n OBS = &line_n; INPUT Raw :$ 1-50; CALL SCAN(Raw, &col, position, slength, ','); %LET sslength = slength; *put sslength=slength; %populate(data_name=High, line_n=1, start_pos=sslength); /*This works: %LET sslength = 19; %populate(data_name=High, line_n=1, start_pos=&sslength); */ RUN; %MEND; %populate(data_name=stock, line_n=1, start_pos=1); %populate(data_name=date, line_n=1, start_pos=5); %getlength(col=3, line_n=1);
Note: Edited to use text boxes.
SOLUTION: I modified the following sections to get this to work. The solution was using symput as KurtBremser suggested. I'm now able to determine the point at which SAS identifies the start location (ie INPUT @&start_pos &data_name :$ of the variable I want to input into my table. The advantage of this is that I am able to avoid SAS having to run through the entire line of data when I only want one variable. It had to be done at run time because the length of each variable can be different, and therefore the @ cursor will be different for every line.
%MACRO getlength(col=, line_n=); DATA _NULL_; %GLOBAL var_length; INFILE '/folders/myfolders/ASXData/19970102.TXT' FIRSTOBS = &line_n OBS = &line_n; INPUT Raw :$ 1-50; CALL SCAN(Raw, &col, position, input_length, ','); input_length = position + input_length + 1; CALL SYMPUT('var_length', input_length); %put var_length=&var_length; RUN; %MEND;
%populate(data_name=High, line_n=1, start_pos=&var_length);
11-01-2017 08:02 AM
11-01-2017 08:02 AM
11-01-2017 08:06 AM
11-01-2017 08:11 AM
11-01-2017 08:15 AM
11-01-2017 08:58 AM - edited 11-01-2017 09:26 AM
It's really difficult to understand why you need any of the complications. SAS knows how to read a delimited line of data. Why can't you simply use:
INFILE '/folders/myfolders/ASXData/19970102.TXT' DSD;
INPUT name $ date var1 var2 var3 var4 var5;
If you really need to pick out column information as you read the data, the COL= option might help. Experiment along these lines:
INFILE '/folders/myfolders/ASXData/19970102.TXT' DSD COL=col_position;
INPUT name $ @ ;
name_end = col_position;
input date @ ;
date_end = col_position;
input var1 @ ;
var1_end = col_position;
... etc. ...
EDITED: In this example, COL_POSITION gets created by the COL= option on the INFILE statement. Note that it may be one or two larger than what you need. What I called "NAME_END" actually may be the column where DATE begins. That's why I mentioned the word "experiment". But once you get the idea it probably provides the information you need.
11-01-2017 09:12 AM - edited 11-01-2017 09:15 AM
I'll have a look and see if this solution can determine length at run time.How do I set the variable col_position though? That's the question I asked.
11-01-2017 01:20 PM
Weird request, but ok.
So I don't know what particular column to set the @ condition to in my input statement. I need to use a variable to do this in the format @&numericvariableforlength.
If there's not a specific column start you can't use this methodology. What you can do is read the data once and output each variable to it's own file, as long as you have a specified delimiter that clearly identfies your variable. Again, not sure how this is practical at all.
Here's an example with a small hardcoded subset and you can automate it from that point - remember the start for a good macro solution, is first a non-macro solution.
data name (keep=name) age (keep=age) sex (keep=sex); infile cards dlm=','; input name $ age sex $; cards; "Alfred", 24, M "Sandra", 25, F "Fred", 32, F ; run;
11-01-2017 01:53 PM
What is the goal? Is it to get the max length for each column in a delimited file?
Here is a program that will generate one row per column in your file with the max length for that column. I included code to capture the first line as column headers.
data want ; array hdr (100) $32767 _temporary_; array len (100) _temporary_; infile csv dsd truncover length=l column=cc end=eof lrecl=1000000; length str $32767 ; do col=1 by 1 until(cc > l); input str @ ; if _n_=1 then hdr(col)=str; else len(col) = max(len(col),lengthn(str)); lastcol=max(lastcol,col); retain lastcol; end; if eof then do col=1 to lastcol; maxlength=len(col); length header $32767 ; header=hdr(col); output; end; keep col maxlength header; run;
So if I make a CSV file out SASHELP.CLASS then the result is:
Obs col maxlength header 1 1 7 Name 2 2 1 Sex 3 3 2 Age 4 4 4 Height 5 5 5 Weight
11-01-2017 05:35 PM - edited 11-01-2017 05:39 PM
You are confused about how macros works. They just generate text which is then interpreted by SAS as code.
So you have a macro, %POPULATE(), that generates a data step.
Then you are trying to call in it in the middle of another data step as if you could nest data steps inside of each other.
But if you try to nest two data steps in a SAS program, like this:
DATA _NULL_; INFILE .... ; ... DATA ASX.&data_name; INFILE ... ; ... run; run;
SAS will just interpret that as two different data step. The first one will end when it sees the second one starting and the extra RUN statement at the end will do nothing.
11-01-2017 06:50 PM
Your whole program does not make any sense.
Your approach of trying to say that variable 3 starts at a specific character in the source file does not make any sense for delimited file. The delimiters are what determine where the values are, not hard coded character positions like you see in a fixed format text file.
What is the goal of the whole project.
If you want to get a dataset that only has one of the variables from the source file it is probably best to first read the whole file first and then use a KEEP statement to keep the variables you want. You really don't save anything by trying to only read some of the columns since you have to physically read the whole text file anyway.
Need further help from the community? Please ask a new question.