BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
gretaolsson
Calcite | Level 5

Hello,

 

I have a CSV file containing 100 samples a 10 observations each. How do I perform logistic regression on each individual sample from the one csv-file? How do I extract the values ​​I'm interested in to a vector, such as estimatates and p-values? 

 

proc logistic data = WORK.IMPORT;
class x y;
model y = x;
run;

 

Here is an extract from the CSV-file:

 

"","num","y","x"
"1",1,1,0
"2",1,1,1
"3",1,1,1
"4",1,1,1
"5",1,1,0
"6",1,1,0
"7",1,1,1
"8",1,1,1
"9",1,1,1
"10",1,1,1
"11",1,1,1
"12",1,1,1
"13",1,1,0
"14",1,0,0
"15",1,1,0
"16",1,1,0
"17",1,1,0
"18",1,1,1
"19",1,1,1
"20",1,1,0
"21",1,0,0
"22",1,1,1
"23",1,0,0
"24",1,1,1
"25",1,1,1
"26",1,1,1
"27",1,1,0
"28",1,1,1
"29",1,0,0
"30",1,1,0
"31",1,1,1
"32",1,1,0
"33",1,1,1
"34",1,1,0
"35",1,0,0
"36",1,1,0
"37",1,1,0
"38",1,0,1
"39",1,1,1
"40",1,1,1
"41",1,1,0

 

 

Grateful for all the help I can get!

1 ACCEPTED SOLUTION

Accepted Solutions
RW9
Diamond | Level 26 RW9
Diamond | Level 26

What does the ods trace show in the log, is the output name called ParameterEstimates, if so the syntax would be:

ods output ParameterEstimates=estim;

 

This will create a dataset in work called estim, with the parameter estimates.

View solution in original post

13 REPLIES 13
PeterClemmensen
Tourmaline | Level 20

What variable indicates which sample the observation belongs to? Not sure this is the answer you are looking for, but I think you want to use the by statement in your PROC LOGISTIC as described here

 

https://support.sas.com/documentation/cdl/en/statug/63347/HTML/default/viewer.htm#statug_logistic_se...

PaigeMiller
Diamond | Level 26

@gretaolsson wrote:

Hello,

 

I have a CSV file containing 100 samples a 10 observations each. How do I perform logistic regression on each individual sample from the one csv-file? How do I extract the values ​​I'm interested in to a vector, such as estimatates and p-values? 

 

proc logistic data = WORK.IMPORT;
class x y;
model y = x;
run;

 

Here is an extract from the CSV-file:

 

"","num","y","x"
"1",1,1,0
"2",1,1,1
"3",1,1,1
"4",1,1,1
"5",1,1,0
"6",1,1,0
"7",1,1,1
"8",1,1,1
"9",1,1,1
"10",1,1,1
"11",1,1,1
"12",1,1,1
"13",1,1,0

 

Grateful for all the help I can get!


You haven't told us what "an individual sample" is.

 

Also, it appears that both x and y are binary 0/1 variables in which case a logistic regression isn't appropriate, something like a contingency table might be a better choice.

--
Paige Miller
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Not sure on the specifics of logistic, but as a general rule in SAS, when you want to do a datastep or a procedure over a certain group, you would use a by statement.  So maybe something along the lines of:

proc logistic data=work.import;
  by sampleid;
  class x y;
  model y = x;
run;
gretaolsson
Calcite | Level 5

How does SAS know how many observations it is in each sample?

This did not work for me ....

RW9
Diamond | Level 26 RW9
Diamond | Level 26

This is an outstanding question from @PaigeMiller and @PeterClemmensen which you haven't answered yet.  You have not told us the full picture, what is sample - its not shown in any of the post?  My code is an example of what you would do if you have a variable for it, if you don't how do you know what sample is?

gretaolsson
Calcite | Level 5

Okay, I did not really catch any question, sorry.

 

I have a CSV file containing 100 samples.
Each sample consists of 10 observations. (So overall I have 1000 observations in one file.)

 

I now want to let SAS read the CSV file and perform logistic regression on the first sample, which is observation "1" to "10" in the CSV-file, then I want SAS to perform logistic regression on the second sample, which is observation "11" to "20" in the CSV-file, and so on and on.
Above I have sent an excerpt on how the data looks in the CSV file. "1" - "10" is the first sample,
"11" - "20" is the second sample, ...., "91" - "100" is the tenth sample and so on up to the hundred sample.

 

If you have any more questions that you need answered, please let me know.

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Right, so your steps are:

1) read in the data - you should already have this.

2) Assign sample id to blocks of 10 records:

data want;
  set have;
  sampleid=ceil(_n_/10);  /* Assign sampleid as record blocks of 10 */  
run;

3) Run logistic with by group:

proc logistic data=want;
  by sampleid;
  class x y;
  model y = x;
run;
gretaolsson
Calcite | Level 5

 

Hi,

This was really helpful, thankyou!

 

I have some questions, I think Im missing something obvious. 
Now I have the following code: 

 

 

PROC IMPORT OUT = filename
FILE = "/folders/myshortcuts/sf_myfolder/filename.csv"
DBMS = CSV
REPLACE;
RUN;
data want; set have; sampleid=ceil(_n_/10); /* Assign sampleid as record blocks of 10 */ run;
proc logistic data=want; by sampleid; class x y; model y = x; run;

Of course, this does not work and my log looks like this:

 

...

 

NOTE: WORK.FILENAME data set was successfully created.
NOTE: The data set WORK.FILENAME has 100 observations and 4 variables.
NOTE: PROCEDURE IMPORT used (Total process time):
real time 0.06 seconds
cpu time 0.04 seconds
 
 
95
96 data want;
97 set have;
ERROR: File WORK.HAVE.DATA does not exist.
98 sampleid=ceil(_n_/10); /* Assign sampleid as record blocks of 10 */
99 run;
 
NOTE: The SAS System stopped processing this step because of errors.
WARNING: The data set WORK.WANT may be incomplete. When this step was stopped there were 0 observations and 1 variables.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.01 seconds
 
 
100
NOTE: PROCEDURE LOGISTIC used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
 
ERROR: Variable X not found.
ERROR: Variable Y not found.
NOTE: The SAS System stopped processing this step because of errors.
101 proc logistic data=want;
102 by sampleid;
103 class x y;
104 model y = x;
105 run;
106
107 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
120
 

I thought I would replace "want" with "filename", but it did not work. How should I do?
gretaolsson
Calcite | Level 5

Hi!

I changed the code and now it works! Thank you very much!

 

PROC IMPORT OUT = filename
          FILE = "/folders/myshortcuts/sf_myfolder/filename.csv"
          DBMS = CSV
          REPLACE;
RUN;

data want;
  set filename;
  sampleid=ceil(_n_/10);  /* Assign sampleid as record blocks of 10 */  
run;

proc logistic data=want;
  by sampleid;
  class x y;
  model y = x;
run;


Is it possible to extract specific values (estimates and p-values) ​​from the result of each regression and put them together into a vector or something similar?

 

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Firstly, avoid using "filename" as a dataset name.  And do please avoid coding in capitals.  You can extract certain parts of teh data in various ways.  In the example you get a dataset which can be filtered and further processed via datastep syntax.  Or you may also get parts of an output by doing:

ods trace on;

--your code--

ods trace off;

 

Then in the log you will see all the component parts which are created by the procedure, you can then rerun your code (removing the ods trace parts as they are just to find out the names):

ods select <object>=<dataset to store>;

gretaolsson
Calcite | Level 5

What do you mean with <dataset to store>?  
Should I create a new file where I can save, for example, estimates? How do I do that then?

gretaolsson
Calcite | Level 5

I would like to save all the parameter estimates from each sample to a vector or list, is that possible?

RW9
Diamond | Level 26 RW9
Diamond | Level 26

What does the ods trace show in the log, is the output name called ParameterEstimates, if so the syntax would be:

ods output ParameterEstimates=estim;

 

This will create a dataset in work called estim, with the parameter estimates.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 13 replies
  • 1589 views
  • 0 likes
  • 4 in conversation