Help using Base SAS procedures

How to Tell PROC CORR to Run Until the First Zero For Each Participant

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 6
Accepted Solution

How to Tell PROC CORR to Run Until the First Zero For Each Participant

Howdy folks,

 

I am running both Pearson and Spearman correlations for a large dataset (approx. 7,000 sets of data), and am wondering whether there is a way to program the PROC CORR program to only run the analysis up to the first instance of "0" for each participant. All sets of data begin at a non-zero number and theoretically drop to zero. However, the point at which these sets of data reach zero differ between participants, so I can't simply delete or replace all columns following the first-occurring zero. For example, see the following three example sets of data (note that each cell indicates number of item purchased at that price):

 

  Price         
  $1$2$3$4$5$6$7$8$9$10
Participant110864200000
 225252520181510000
 32000000000


 
Notice that as price goes up, the instances of the observed purchase decrease. What I need for PROC CORR to execute is to read each set of observations, and only analyze observations through the first zero (for example, I highlighted these zeroes in the above set), but not consider any other zeroes after the first zero. This task needs to be executed for the basic Pearson PROC CORR and the SPEARMAN-enabled PROC CORR statement.  

Theoretically, I could simply work through the code and delete (or replace) all zeroes after the first-occurring zero, but it would be difficult to do so for 7,000 sets of observations, and I believe that SAS is capable of executing this task. 

 

I had previously used an ARRAY function to replace all instances of zero with "." to mark them as missing. However, the analysis requires that the first-occuring zero be considered as a component of the function, and the ARRAY function I was using would delete the first-occuring zero.

 

Any advice? Many thanks for your time!


Accepted Solutions
Solution
‎05-24-2016 08:58 PM
Super User
Posts: 9,676

Re: How to Tell PROC CORR to Run Until the First Zero For Each Participant

Which two variables are you going to use to calculate Correlation coefficience ? two participant ? or two Price ?

And why not just set the zeros which follows the first zero to be missing  value ?

 

data have;
infile cards expandtabs truncover;
input Participant	_1-_10;
cards;
1	10	8	6	4	2	0	0	0	0	0
2	25	25	25	20	18	15	10	0	0	0
3	2	0	0	0	0	0	0	0	0	0
;
run;
data want;
 set have;
 array x{*} _1-_10;
 do i=1 to dim(x);
  if found then x{i}=.;
  if x{i}=0 then found=1;
 end;
 drop i found;
run;

proc print;run;

x.png

 

View solution in original post


All Replies
Respected Advisor
Posts: 4,644

Re: How to Tell PROC CORR to Run Until the First Zero For Each Participant

What are you correlating. Price with number of purchases or number of purchases between participants?

PG
Trusted Advisor
Posts: 1,204

Re: How to Tell PROC CORR to Run Until the First Zero For Each Participant

One way to do this is to stack all variables' names and values then use by processing in proc corr for values greater than 0. Something like this:

 

data have;
input Price Participant1 Participant2 Participant3;
datalines;
1 10 25 0
2 8 25 0
3 6 25 0
4 4 20 0
5 2 18 0
6 0 15 0
7 0 10 0
8 0 0 0
9 0 0 0
10 0 0 0
;

 

data want(keep=variable price value);
set have;
array p(*) Participant:;
do i=1 to dim(p);
value=p(i);
variable=vname(p(i));
output;
end;
run;

 

proc sort data=want;
by variable;
run;

 

proc corr data=want(where=(value>0));
by variable;
var price;
with value;
run;

Occasional Contributor
Posts: 6

Re: How to Tell PROC CORR to Run Until the First Zero For Each Participant

Thanks for the reply. I see that I should have been clearer in the original post - I need to include the first instance of zero (for each participant) in the proc corr. Would this syntax maintain the first instance of zero?
Trusted Advisor
Posts: 1,204

Re: How to Tell PROC CORR to Run Until the First Zero For Each Participant

Please try the following syntax that will maintain the first instance of zero for proc corr:

 

data want(keep=variable price value);
set have;
array p(*) Participant:;
do i=1 to dim(p);
value=p(i);
variable=vname(p(i));
output;
end;
run;

 

proc sort data=want;
by variable;
run;

 

data corr(drop=flag);
do until(last.variable);
set want;
by variable;
if not flag then output;
if value=0 then flag = 1;
end;
run;

 

proc corr data=corr;
by variable;
var price;
with value;
run;

Occasional Contributor
Posts: 6

Re: How to Tell PROC CORR to Run Until the First Zero For Each Participant

As an addendum, is there any SAS code that could modify the dataset with which I'm working so as to simply *delete* or mark as missing all values after the first 0, rather than simply teaching the PROC CORR to only read up to the first zero? Having the data sets pruned to this point would help quite a bit with future analyses.
Respected Advisor
Posts: 4,644

Re: How to Tell PROC CORR to Run Until the First Zero For Each Participant

Make your data long instead of wide:

 

data long;
set wide;
array A n1-n10;
do price = 1 to dim(A) until(A{price}=0);
	number = A{price};
	output;
	end;
keep participant price number;
run;
PG
Solution
‎05-24-2016 08:58 PM
Super User
Posts: 9,676

Re: How to Tell PROC CORR to Run Until the First Zero For Each Participant

Which two variables are you going to use to calculate Correlation coefficience ? two participant ? or two Price ?

And why not just set the zeros which follows the first zero to be missing  value ?

 

data have;
infile cards expandtabs truncover;
input Participant	_1-_10;
cards;
1	10	8	6	4	2	0	0	0	0	0
2	25	25	25	20	18	15	10	0	0	0
3	2	0	0	0	0	0	0	0	0	0
;
run;
data want;
 set have;
 array x{*} _1-_10;
 do i=1 to dim(x);
  if found then x{i}=.;
  if x{i}=0 then found=1;
 end;
 drop i found;
run;

proc print;run;

x.png

 

Occasional Contributor
Posts: 6

Re: How to Tell PROC CORR to Run Until the First Zero For Each Participant


Ksharp wrote:

Which two variables are you going to use to calculate Correlation coefficience ? two participant ? or two Price ?

And why not just set the zeros which follows the first zero to be missing  value ?

 

data have;
infile cards expandtabs truncover;
input Participant	_1-_10;
cards;
1	10	8	6	4	2	0	0	0	0	0
2	25	25	25	20	18	15	10	0	0	0
3	2	0	0	0	0	0	0	0	0	0
;
run;
data want;
 set have;
 array x{*} _1-_10;
 do i=1 to dim(x);
  if found then x{i}=.;
  if x{i}=0 then found=1;
 end;
 drop i found;
run;

proc print;run;

x.png

 


This looks like exactly what I'm looking for! Thanks for sending this. Will this syntax work with data libraries that have already been imported to SAS? I.E., can I just replace the highlighted text below with my libname.refname ?

 

data want;
 set have;
 array x{*} _1-_10;
 do i=1 to dim(x);
  if found then x{i}=.;
  if x{i}=0 then found=1;
 end;
 drop i found;
run;

proc print;run;

 

Super User
Posts: 9,676

Re: How to Tell PROC CORR to Run Until the First Zero For Each Participant

Yes. You can use that as long as it turn into a SAS dataset.

 

data yourlib.want;
 set yourlib.have;

 

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 9 replies
  • 422 views
  • 10 likes
  • 4 in conversation