Solved: Re: data transformation

asasha · Posted 02-10-2021 09:52 AM

I have a data like this (22x101):

     col1 col2 col3 ... col100
x    1.12 0.15 0.99 ... 0.38
t1   1          
t2             1        
t3   1                
...
t20            1

Each t row will have exactly one indicator=1. Each column corresponding to an x value may have none, one, or multiple indicator=1.

For example, x=1.12 occurs at t1 and t3. And x=0.99 occurs at t2 and t20.

I need to collect the corresponding x where t has a 1 value and save them to a new data:

t1  1.12
t2  0.99
t3  1.12
...
t20 0.99

Please advise. Thank you.

PaigeMiller · Posted 02-10-2021 12:38 PM

UNTESTED CODE

Since you haven't given the column which contains x t1 t2 a variable name, I refer to it as the variable named firstcolumn.

If you want tested code, you must provide data according to these instructions and not in any other form.

data x;
    set have(where=(upcase(firstcolumn)='X'));
    rename col1-col100=x1-x100;
run;

data want;
    if _n_=1 then set x;
    set have(where=(upcase(firstcolumn)=:'T'));
    array x x1-x100;
    first_occurence=whichn(1,of col:);
    wanted_value=x(first_occurence);
    keep firstcolumn wanted_value;
run;

--
Paige Miller

View solution in original post

LinusH · Posted 02-10-2021 12:27 PM

Is this a one time job?

If yes, do it manually (e.g. in Excel).

If not, ask for source fil(s) with proper structure.

Last resort, I would try the data step.

Put the first row in a retained array.

Then use array processing for the t-rows tro find which value in the x-array to use.

Untested though.

Data never sleeps

asasha · Posted 02-10-2021 01:01 PM

This is not a one time job. I have been doing it manually in excel, but it would really help to automate it. The first data is available as shown, and I just need to extract the necessary information, so I can for example take the median of the 20 x values.

Kurt_Bremser · Posted 02-10-2021 01:09 PM

In which form is it available?

If SAS dataset: post a data step with datalines that creates this dataset in our SAS session(s).

If text file: copy/paste the text into a window opened with this button:

If Excel: attach the Excel file.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

PaigeMiller · Posted 02-10-2021 12:38 PM

UNTESTED CODE

Since you haven't given the column which contains x t1 t2 a variable name, I refer to it as the variable named firstcolumn.

If you want tested code, you must provide data according to these instructions and not in any other form.

data x;
    set have(where=(upcase(firstcolumn)='X'));
    rename col1-col100=x1-x100;
run;

data want;
    if _n_=1 then set x;
    set have(where=(upcase(firstcolumn)=:'T'));
    array x x1-x100;
    first_occurence=whichn(1,of col:);
    wanted_value=x(first_occurence);
    keep firstcolumn wanted_value;
run;

--
Paige Miller

FreelanceReinh · Posted 02-10-2021 12:52 PM

Hi @asasha,

@asasha wrote:

I have a data like this (22x101): ...

This sounds like you haven't read the raw data yet? So you may want to create the desired dataset in one step (using LinusH's plan):

%let n=4; /* n=100 with your real data */

data want(keep=id v);
array col[&n];
retain col:;
input id $ @;
if _n_=1 then input col[*];
else do;
  input z1-z&n;
  v=col[whichn(1, of z:)];
  output;
end;
cards;
x    1.12 0.15 0.99 0.38
t1   1     .    .    .
t2    .    .   1     .
t3   1     .    .    .
t20   .    .   1     .
;

asasha · Posted 02-10-2021 09:42 PM

@LinusH @Kurt_Bremser @FreelanceReinh @PaigeMiller

Here's an exact subset example in cvs (22 rows, 17 columns). Alternatively, I just need to extract all 20 of the x's, regardless of which t they correspond to. Like I said, I have 100 of them (not just 15) and they are spread around. Just need the ones that were selected for t.

Parameter,t,col1,col2,col3,col4,col5,col6,col7,col8,col9,col10,col11,col12,col13,col14,col15
x,,0.98,0.882,0.7938,0.71442,0.642978,0.5786802,0.52081218,0.468730962,0.421857866,0.379672079,0.341704871,0.307534384,0.276780946,0.249102851,0.224192566
Opt,t1,,,,,,,1,,,,,,,,
Opt,t2,,,,,,,,,,1,,,,,
Opt,t3,,,,,,,,,,1,,,,,
Opt,t4,,,,,1,,,,,,,,,,
Opt,t5,,,,,,,,,,1,,,,,
Opt,t6,,,,,,,,1,,,,,,,
Opt,t7,,,,,,,,1,,,,,,,
Opt,t8,,,,,,,,,,,,1,,,
Opt,t9,,,,,,,,,,,1,,,,
Opt,t10,,,,,,,,1,,,,,,,
Opt,t11,,,,,,,,,1,,,,,,
Opt,t12,,,,,,,,,1,,,,,,
Opt,t13,,,,,,,,,,,1,,,,
Opt,t14,,,,1,,,,,,,,,,,
Opt,t15,,,,,,,1,,,,,,,,
Opt,t16,,1,,,,,,,,,,,,,
Opt,t17,,,,,,1,,,,,,,,,
Opt,t18,,,,,,,,,,1,,,,,
Opt,t19,,,,,,,,,,,,,1,,
Opt,t20,,,,,,1,,,,,,,,,

Kurt_Bremser · Posted 02-11-2021 02:38 AM

See this:

data want;
infile datalines dlm="," dsd truncover;
input
  Parameter $
  t $
  col1-col15
;
retain t1-t15;
array col {*} col1-col15;
array ta {*} t1-t15;
if parameter = "x"
then do;
  do i = 1 to dim(col);
    ta{i} = col{i};
  end;
end;
else do;
  do i = 1 to dim(col);
    if col{i}
    then do;
      x = ta{i};
      leave;
    end;
  end;
  output;
end;
keep t x;
datalines;
x,,0.98,0.882,0.7938,0.71442,0.642978,0.5786802,0.52081218,0.468730962,0.421857866,0.379672079,0.341704871,0.307534384,0.276780946,0.249102851,0.224192566
Opt,t1,,,,,,,1,,,,,,,,
Opt,t2,,,,,,,,,,1,,,,,
Opt,t3,,,,,,,,,,1,,,,,
Opt,t4,,,,,1,,,,,,,,,,
Opt,t5,,,,,,,,,,1,,,,,
Opt,t6,,,,,,,,1,,,,,,,
Opt,t7,,,,,,,,1,,,,,,,
Opt,t8,,,,,,,,,,,,1,,,
Opt,t9,,,,,,,,,,,1,,,,
Opt,t10,,,,,,,,1,,,,,,,
Opt,t11,,,,,,,,,1,,,,,,
Opt,t12,,,,,,,,,1,,,,,,
Opt,t13,,,,,,,,,,,1,,,,
Opt,t14,,,,1,,,,,,,,,,,
Opt,t15,,,,,,,1,,,,,,,,
Opt,t16,,1,,,,,,,,,,,,,
Opt,t17,,,,,,1,,,,,,,,,
Opt,t18,,,,,,,,,,1,,,,,
Opt,t19,,,,,,,,,,,,,1,,
Opt,t20,,,,,,1,,,,,,,,,
;

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

asasha · Posted 02-11-2021 08:50 AM

this didn't work as my t is up to 20, and my col is up to 15 (or 100).

PaigeMiller · Posted 02-11-2021 08:56 AM

@asasha wrote:

this didn't work ...

For your future benefit, if you say "didn't work" and do not provide the LOG or the DATA to show what you did, and why it didn't work, then you don't get an answer. (I don't have an answer for you right now.)

So, show us the LOG (and if there is incorrect output, show us the incorrect output as well)

--
Paige Miller

Kurt_Bremser · Posted 02-11-2021 09:03 AM

The number of t observations is irrelevant with my code, and it should be obvious where to implement a larger number of columns.

Spoiler

Simply replace the 15 in the array definitions with 100

@asasha wrote:

this didn't work as my t is up to 20, and my col is up to 15 (or 100).

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

PaigeMiller · Posted 02-11-2021 06:43 AM

This data does not match the originally provided data, as the first column originally provided has a value of x in the first row, but not t1 in the second row.

Nevertheless, with minor changes, the code I provided should still work.

--
Paige Miller

Registration is open

SAS Training: Just a Click Away