I need some help getting a dataset transformed. The goal is to take dataset HAVE and turn it into dataset WANT. See below for details. I don't know how to accomplish this task. Thanks in advance for your help.
Gene
data have;
input Target $ Tx Ty Station $ k;
datalines;
0/0 0 0 A 1
0/0 0 0 B 0
0/0 0 0 C 0
0/1 0 1 A 1
0/1 0 1 B 1
0/1 0 1 C 0
1/0 1 0 A 1
1/0 1 0 B 1
1/0 1 0 C 1
;
run;
proc sort data=have;
by target;
run;
/*Need code here (Proc Transpose?) that produces the following dataset from dataset Have
where ksum is the sum of k for each unique Target value*/
Data want;
input Target $ Tx Ty ksum S1 $ S2 $ S3 $;
datalines;
0/0 0 0 1 A . .
0/1 0 1 2 A B .
1/0 1 0 3 A B C
;
run;
/* The goal is to create a scatter plot of Tx by Ty grouped by ksum with datatips
that display Target,ksum,S1,S2,S3. See below*/
ods graphics on/imagemap;
Proc SGPlot data=want;
scatter x=tx y=ty/group=ksum tip=(Target ksum S1 S2 S3);
xaxis min=-2 max=2;
yaxis min=-2 max=2;
run;
Can you indicate exactly which rows/values you are summing to get the output data set?
From what little I understand there is more than one way to get several of those "sums".
This seems to work for your example data.
I'm not sure how extensible it might be.
data want; set have; array s(3) $; retain ksum s1-s3 counter; by target; if first.target then do; call missing(ksum, of s(*),counter); end; if k then do; counter+1; ksum+k; s[counter] = station; end; if last.target then output; drop station k counter; run;
You might want to use either a custom format to display something like N/A for the S variables when the value is missing or set Options missing=' '; before generating the graph to make slightly cleaner tip appearance.
Can you indicate exactly which rows/values you are summing to get the output data set?
From what little I understand there is more than one way to get several of those "sums".
This seems to work for your example data.
I'm not sure how extensible it might be.
data want; set have; array s(3) $; retain ksum s1-s3 counter; by target; if first.target then do; call missing(ksum, of s(*),counter); end; if k then do; counter+1; ksum+k; s[counter] = station; end; if last.target then output; drop station k counter; run;
You might want to use either a custom format to display something like N/A for the S variables when the value is missing or set Options missing=' '; before generating the graph to make slightly cleaner tip appearance.
Thanks to all who responded with solutions. They all work but I will mark ballardw's solution as "accepted" since his was first. Thanks again for your help.
Gene
If K is always a 1 or 0, then as ksum increments, then next S variable (S1, S2, S3) is assigned the corresponding station value:
data have;
input Target $ Tx Ty Station $ k;
datalines;
0/0 0 0 A 1
0/0 0 0 B 0
0/0 0 0 C 0
0/1 0 1 A 1
0/1 0 1 B 1
0/1 0 1 C 0
1/0 1 0 A 1
1/0 1 0 B 1
1/0 1 0 C 1
run;
data want (drop=k station);
set have;
by target;
retain ksum 0 s1-s3 ' ';
array s{3};
if first.target then call missing (ksum,s1,s2,s3);
ksum+k;
if k=1 then s{ksum}=station;
if last.target;
run;
No idea how this relates to making a graphic.
But you just need a DO loop.
data have;
input Target $ Tx Ty Station $ k;
datalines;
0/0 0 0 A 1
0/0 0 0 B 0
0/0 0 0 C 0
0/1 0 1 A 1
0/1 0 1 B 1
0/1 0 1 C 0
1/0 1 0 A 1
1/0 1 0 B 1
1/0 1 0 C 1
;
data want;
do until(last.target) ;
set have ;
by target;
array S[3] $8;
ksum=sum(ksum,k);
if k then S[ksum]=station;
end;
run;
Result
Obs Target Tx Ty Station k S1 S2 S3 ksum 1 0/0 0 0 C 0 A 1 2 0/1 0 1 C 0 A B 2 3 1/0 1 0 C 1 A B C 3
Change the dimension of the ARRAY if there might be more than three stations for a target you need to record.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.