BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
genemroz
Pyrite | Level 9

I need some help getting a dataset transformed.  The goal is to take dataset HAVE and turn it into dataset WANT. See below for details.  I don't know how to accomplish this task.  Thanks in advance for your help.

Gene

data have;
input Target $ Tx Ty Station $ k;
datalines;
0/0 0 0 A 1
0/0 0 0 B 0
0/0 0 0 C 0
0/1 0 1 A 1
0/1 0 1 B 1
0/1 0 1 C 0
1/0 1 0 A 1
1/0 1 0 B 1
1/0 1 0 C 1
;
run;
proc sort data=have;
by target;
run;

/*Need code here (Proc Transpose?) that produces the following dataset from dataset Have
where ksum is the sum of k for each unique Target value*/

Data want;
input Target $ Tx Ty ksum S1 $ S2 $ S3 $;
datalines;
0/0 0 0 1 A . .
0/1 0 1 2 A B .
1/0 1 0 3 A B C
;
run;

 

/* The goal is to  create a scatter plot of Tx by Ty grouped by ksum with datatips
that display Target,ksum,S1,S2,S3.  See below*/


ods graphics on/imagemap;
Proc SGPlot data=want;
scatter x=tx y=ty/group=ksum tip=(Target ksum S1 S2 S3);
xaxis min=-2 max=2;
yaxis min=-2 max=2;
run;

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

Can you indicate exactly which rows/values you are summing to get the output data set?

 

From what little I understand there is more than one way to get several of those "sums".

This seems to work for your example data.

I'm not sure how extensible it might be.

data want;
   set have;
   array s(3) $;
   retain ksum s1-s3 counter;
   by target;
   if first.target then do;
      call missing(ksum, of s(*),counter);

   end;
   if k then do;
      counter+1;
      ksum+k;
      s[counter] = station;
   end;
   if last.target then output;
   drop station k counter;
run;


You might want to use either a custom format to display something like N/A for the S variables when the value is missing or set Options missing=' '; before generating the graph to make slightly cleaner tip appearance.

 

 

View solution in original post

4 REPLIES 4
ballardw
Super User

Can you indicate exactly which rows/values you are summing to get the output data set?

 

From what little I understand there is more than one way to get several of those "sums".

This seems to work for your example data.

I'm not sure how extensible it might be.

data want;
   set have;
   array s(3) $;
   retain ksum s1-s3 counter;
   by target;
   if first.target then do;
      call missing(ksum, of s(*),counter);

   end;
   if k then do;
      counter+1;
      ksum+k;
      s[counter] = station;
   end;
   if last.target then output;
   drop station k counter;
run;


You might want to use either a custom format to display something like N/A for the S variables when the value is missing or set Options missing=' '; before generating the graph to make slightly cleaner tip appearance.

 

 

genemroz
Pyrite | Level 9

Thanks to all who responded with solutions.  They all work but I will mark ballardw's solution as "accepted" since his was first.  Thanks again for your help.

 

Gene

mkeintz
PROC Star

If K is always a 1 or 0, then as ksum increments, then next S variable (S1, S2, S3) is assigned the corresponding station value:

 

data have;
input Target $ Tx Ty Station $ k;
datalines;
0/0 0 0 A 1
0/0 0 0 B 0
0/0 0 0 C 0
0/1 0 1 A 1
0/1 0 1 B 1
0/1 0 1 C 0
1/0 1 0 A 1
1/0 1 0 B 1
1/0 1 0 C 1
run;
data want (drop=k station);
  set have;
  by target;
  retain ksum 0 s1-s3 '  ';
  array s{3};
  if first.target then call missing (ksum,s1,s2,s3);
  ksum+k;
  if k=1 then s{ksum}=station;
  if last.target;
run;
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Tom
Super User Tom
Super User

No idea how this relates to making a graphic.

But you just need a DO loop.

data have;
  input Target $ Tx Ty Station $ k;
datalines;
0/0 0 0 A 1
0/0 0 0 B 0
0/0 0 0 C 0
0/1 0 1 A 1
0/1 0 1 B 1
0/1 0 1 C 0
1/0 1 0 A 1
1/0 1 0 B 1
1/0 1 0 C 1
;

data want;
  do until(last.target) ;
    set have ;
    by target;
    array S[3] $8;
    ksum=sum(ksum,k);
    if k then S[ksum]=station;
  end;
run;

Result

Obs    Target    Tx    Ty    Station    k    S1    S2    S3    ksum

 1      0/0       0     0       C       0    A                   1
 2      0/1       0     1       C       0    A     B             2
 3      1/0       1     0       C       1    A     B     C       3

Change the dimension of the ARRAY if there might be more than three stations for a target you need to record.

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 1306 views
  • 1 like
  • 4 in conversation