06-24-2016 04:08 PM
I am trying to import this data set into SAS that has a bunch of variables, and one of those variables is called "MACofResponse". For each observation, there ae multiple MACofResponses, with varying numbers for each example (ie. some have 11, others have 9, and so forth). How would you suggest I go about reading in those variables? My teacher recommended using an array, however I'm not sure how to use an array to read in variables, I only know how to use arrays on variables that are already read in.
06-24-2016 04:22 PM
Personally, I'd read it in as one variable - text and then parse it out afterwards in a separate data step.
There are a lot of examples on here on how to do that using COUNTW and SCAN, assuming you have a clear delimiter.
You do ideally need to know how many you have, unless you want to change your data structure so that it's long - ie one row for every item in the field.
06-25-2016 11:52 AM
Here's an example that does not require parsing from a character variable. But whether you use Reeza's suggestion or this suggestion, it really depends on the way the input data looks. Is it delimited? Is it a fixed column format? Are you reading from Excel? The structure of the input data will impact the method you will need to use.
For example, suppose I have this purchase data in a file called c:\temp\purchases.csv:
anna,1001,100,200,300 bob,2002,250,75,300,100,150 carla,3003,100 dave,4004,.
To read this data, you should know an "upper" number of purchases that someone might have. In this instance, I know that the max number of purchases is 5. So this is the program I write:
data purchases; keep name idnum purch1-purch5 numpurch; infile 'c:\temp\purchases.csv' dlm=',' dsd missover; input name $ idnum $ purch1-purch5; array p_arr purch1-purch5; numpurch=0; do i = 1 to dim(p_arr); if p_arr(i) gt 0 then numpurch=numpurch + 1; if p_arr(i) = . then do; p_arr(i) = 0; end; end; run; proc print data=purchases; run;
With the MISSOVER option, when SAS gets to the end of one line of data, if there are not enough values on the input data line to fill all the "PURCH" variables, then the remaining variables will be set to missing (or .).
Then, if I want those missing values to be set to 0, after I read them in, I declare an ARRAY that is 5 members big and use a DO loop to go through the array and do 2 things: 1) increment a counter called NUMPURCH so I know how many purchases out of the 5 that someone actually had and 2) if a PURCH value is missing assign it a value of 0.
And, here's the output:
If I don't have any idea of the upper limit, I can pick a very large limit (like 10 or 25 or 50). In this example, I changed the INPUT statement to try to read 10 values for the PURCH variable and so PURCH1-PURCH10 got created, but of course, most of the values were 0 after the DO loop. But I still have the NUMPURCH variable to tell me how many purchases each person had:
and there weren't that many changes in the previous program to accomplish the change -- except for these 3 statements, the rest of the program and the data for the program was the same:
keep name idnum purch1-purch10 numpurch;
infile 'c:\temp\purchases.csv' dlm=',' dsd missover;
input name $ idnum $ purch1-purch10;
array p_arr purch1-purch10;
* * * rest of code same * * *;
Or, depending on what your data looks like then you might have to read the data as a character string and then parse out the separate variables.