Solved: How to Tell PROC CORR to Run Until the First Zero For Each Participant

Beartato · Posted 05-10-2016 08:34 PM

Howdy folks,

I am running both Pearson and Spearman correlations for a large dataset (approx. 7,000 sets of data), and am wondering whether there is a way to program the PROC CORR program to only run the analysis up to the first instance of "0" for each participant. All sets of data begin at a non-zero number and theoretically drop to zero. However, the point at which these sets of data reach zero differ between participants, so I can't simply delete or replace all columns following the first-occurring zero. For example, see the following three example sets of data (note that each cell indicates number of item purchased at that price):

		Price
		$1	$2	$3	$4	$5	$6	$7	$8	$9	$10
Participant	1	10	8	6	4	2	0	0	0	0	0
	2	25	25	25	20	18	15	10	0	0	0
	3	2	0	0	0	0	0	0	0	0	0

Notice that as price goes up, the instances of the observed purchase decrease. What I need for PROC CORR to execute is to read each set of observations, and only analyze observations through the first zero (for example, I highlighted these zeroes in the above set), but not consider any other zeroes after the first zero. This task needs to be executed for the basic Pearson PROC CORR and the SPEARMAN-enabled PROC CORR statement.

Theoretically, I could simply work through the code and delete (or replace) all zeroes after the first-occurring zero, but it would be difficult to do so for 7,000 sets of observations, and I believe that SAS is capable of executing this task.

I had previously used an ARRAY function to replace all instances of zero with "." to mark them as missing. However, the analysis requires that the first-occuring zero be considered as a component of the function, and the ARRAY function I was using would delete the first-occuring zero.

Any advice? Many thanks for your time!

Ksharp · Posted 05-11-2016 01:50 AM

Which two variables are you going to use to calculate Correlation coefficience ? two participant ? or two Price ?

And why not just set the zeros which follows the first zero to be missing value ?

data have;
infile cards expandtabs truncover;
input Participant	_1-_10;
cards;
1	10	8	6	4	2	0	0	0	0	0
2	25	25	25	20	18	15	10	0	0	0
3	2	0	0	0	0	0	0	0	0	0
;
run;
data want;
 set have;
 array x{*} _1-_10;
 do i=1 to dim(x);
  if found then x{i}=.;
  if x{i}=0 then found=1;
 end;
 drop i found;
run;

proc print;run;

View solution in original post

PGStats · Posted 05-10-2016 10:12 PM

What are you correlating. Price with number of purchases or number of purchases between participants?

PG

stat_sas · Posted 05-10-2016 10:25 PM

One way to do this is to stack all variables' names and values then use by processing in proc corr for values greater than 0. Something like this:

data have;
input Price Participant1 Participant2 Participant3;
datalines;
1 10 25 0
2 8 25 0
3 6 25 0
4 4 20 0
5 2 18 0
6 0 15 0
7 0 10 0
8 0 0 0
9 0 0 0
10 0 0 0
;

data want(keep=variable price value);
set have;
array p(*) Participant:;
do i=1 to dim(p);
value=p(i);
variable=vname(p(i));
output;
end;
run;

proc sort data=want;
by variable;
run;

proc corr data=want(where=(value>0));
by variable;
var price;
with value;
run;

Beartato · Posted 05-10-2016 10:46 PM

Thanks for the reply. I see that I should have been clearer in the original post - I need to include the first instance of zero (for each participant) in the proc corr. Would this syntax maintain the first instance of zero?

stat_sas · Posted 05-11-2016 12:47 AM

Please try the following syntax that will maintain the first instance of zero for proc corr:

data want(keep=variable price value);
set have;
array p(*) Participant:;
do i=1 to dim(p);
value=p(i);
variable=vname(p(i));
output;
end;
run;

proc sort data=want;
by variable;
run;

data corr(drop=flag);
do until(last.variable);
set want;
by variable;
if not flag then output;
if value=0 then flag = 1;
end;
run;

proc corr data=corr;
by variable;
var price;
with value;
run;

Beartato · Posted 05-10-2016 10:48 PM

As an addendum, is there any SAS code that could modify the dataset with which I'm working so as to simply *delete* or mark as missing all values after the first 0, rather than simply teaching the PROC CORR to only read up to the first zero? Having the data sets pruned to this point would help quite a bit with future analyses.

PGStats · Posted 05-10-2016 10:57 PM

Make your data long instead of wide:

data long;
set wide;
array A n1-n10;
do price = 1 to dim(A) until(A{price}=0);
	number = A{price};
	output;
	end;
keep participant price number;
run;

PG

Ksharp · Posted 05-11-2016 01:50 AM

Which two variables are you going to use to calculate Correlation coefficience ? two participant ? or two Price ?

And why not just set the zeros which follows the first zero to be missing value ?

data have;
infile cards expandtabs truncover;
input Participant	_1-_10;
cards;
1	10	8	6	4	2	0	0	0	0	0
2	25	25	25	20	18	15	10	0	0	0
3	2	0	0	0	0	0	0	0	0	0
;
run;
data want;
 set have;
 array x{*} _1-_10;
 do i=1 to dim(x);
  if found then x{i}=.;
  if x{i}=0 then found=1;
 end;
 drop i found;
run;

proc print;run;

Beartato · Posted 05-11-2016 01:33 PM

@Ksharp wrote:
Which two variables are you going to use to calculate Correlation coefficience ? two participant ? or two Price ?
And why not just set the zeros which follows the first zero to be missing value ?
data have;
infile cards expandtabs truncover;
input Participant	_1-_10;
cards;
1	10	8	6	4	2	0	0	0	0	0
2	25	25	25	20	18	15	10	0	0	0
3	2	0	0	0	0	0	0	0	0	0
;
run;
data want;
 set have;
 array x{*} _1-_10;
 do i=1 to dim(x);
  if found then x{i}=.;
  if x{i}=0 then found=1;
 end;
 drop i found;
run;

proc print;run;

This looks like exactly what I'm looking for! Thanks for sending this. Will this syntax work with data libraries that have already been imported to SAS? I.E., can I just replace the highlighted text below with my libname.refname ?

data want;
 set have;
 array x{*} _1-_10;
 do i=1 to dim(x);
  if found then x{i}=.;
  if x{i}=0 then found=1;
 end;
 drop i found;
run;

proc print;run;

Ksharp · Posted 05-11-2016 09:00 PM

Yes. You can use that as long as it turn into a SAS dataset.

data yourlib.want;
 set yourlib.have;

How to Tell PROC CORR to Run Until the First Zero For Each Participant

Re: How to Tell PROC CORR to Run Until the First Zero For Each Participant

Re: How to Tell PROC CORR to Run Until the First Zero For Each Participant

Re: How to Tell PROC CORR to Run Until the First Zero For Each Participant

Re: How to Tell PROC CORR to Run Until the First Zero For Each Participant

Re: How to Tell PROC CORR to Run Until the First Zero For Each Participant

Re: How to Tell PROC CORR to Run Until the First Zero For Each Participant

Re: How to Tell PROC CORR to Run Until the First Zero For Each Participant

Re: How to Tell PROC CORR to Run Until the First Zero For Each Participant

Re: How to Tell PROC CORR to Run Until the First Zero For Each Participant

Re: How to Tell PROC CORR to Run Until the First Zero For Each Participant

Registration is open

SAS Training: Just a Click Away