transposing IDs

stellapersis7 · Posted 06-12-2024 03:11 PM

Hi all,

I need some help with coding. I want the IDs in a single column and flag the study ids as 1 and control ids as 0.

Please see below

data I have

study id	control id	study id duration	controlid duration
x	a	10	25
x	b	10	45
x	c	10	103
x	d	10	150
y	m	45	91
y	n	45	123
y	o	45	321
y	p	45	147

Data I want

ID	flag	duration
x	1	10
a	0	25
b	0	45
c	0	103
d	0	150
y	1	45
m	0	91
n	0	123
o	0	321
p	0	147

The dataset is as follows:

data WORK.CC;
infile datalines dsd truncover;
input study_id:32. control_id:32. study_duration:32. control_duration:32.;
label study_id="Enrollee ID" control_id="Enrollee ID";
datalines;
1096503 737477001 141 1038
1096503 1350295201 141 1069
1096503 2661039302 141 640
1096503 33596554301 141 415
15101702 2266821602 107 949
15101702 2844664901 107 1078
15101702 4272672001 107 810
15101702 33155637601 107 712
27264303 1324920304 36 40
27264303 3409397803 274 448
27264303 3955249504 274 340
27264303 4859531601 36 123
;;;;

Please help.

thank you in advance.

ballardw · Posted 06-12-2024 03:29 PM

You don't provide an explicit rule for what the Flag means.

Where did all of the X and Y go? If you have 4 input X values then you need to explicitly state a rule for why there is only one.

The general "transpose" would go like this:

data work.want;
   set work.cc;
   array A (*) study_id        control_id;
   array d (*) study_duration  control_duration;
   do i=1 to dim(A);
      id= A[i];
      Duration = d[i];
      flag=(i=1);
      output;
   end;
   keep id duration flag;
run;

If you want a specific order in the output that should also be stated because that example needs work if there is an expected order.

stellapersis7 · Posted 06-12-2024 04:01 PM

Hi Ballardw,

Thanks for the response.

I matched cases and controls in my study (1:4 matching) and each case and control has a drug utilization duration. So now I need all the ids in one column, however I need all the cases (study_id) to be coded as 1 and all the controls (control_id) to be coded as 0 for differentiation.

I used the code you gave, but it gives me repetitive study ids for each control id, like in 1:1 matching.

Can you help me fix it.

Thank you.

ballardw · Posted 06-12-2024 05:38 PM

I guessed at the rule for the flag because your data set does not have any of the values that matched your output.

I still do not see a clear definition of the rule for assigning the value of the flag.

You have at least one other requirement (i suspect actually more) that you have not provided a programming rule for. Namely, WHY is there only 1 X in the output? Or what are the rules for removing the duplicates?

I can make a guess as to a possible rule for removing duplicates but you really don't what to rely on guessing. Provide the rule(s).

I may be going on a bit on this topic but the clearer you state requirements the better a solution will be.

I know and easy way to do that but it involves changing the order of the data. You did not address the question is there an expected order requirement.

Now lets ask another question: After you use this combined data will you need to rematch the same "controls" to the "same cases"?

Tom · Posted 06-12-2024 03:59 PM

Does NOT look like a TRANSPOSE at all.

data have;
  infile datalines truncover;
  input study_id control_id study_duration control_duration;
datalines;
1096503 737477001 141 1038
1096503 1350295201 141 1069
1096503 2661039302 141 640
1096503 33596554301 141 415
15101702 2266821602 107 949
15101702 2844664901 107 1078
15101702 4272672001 107 810
15101702 33155637601 107 712
27264303 1324920304 36 40
27264303 3409397803 274 448
27264303 3955249504 274 340
27264303 4859531601 36 123
;;;;

data want;
  set have(in=in1) have(in=in2);
  by study_id ;
  if in1 then do;
    if first.study_id;
    id=study_id;
    flag=1;
    duration=study_duration;
  end;
  else do;
    id=control_id;
    flag=0;
    duration=control_duration;
  end;
  keep id flag duration;
run;

Or you can do the something reading the dataset only once by adding some OUTPUT statements.

data want;
  set have;
  by study_id;
  flag = first.study_id;
  if flag then output;
  flag=0;
  study_id=control_id;
  study_duration=control_duration;
  output;
  keep study_id flag study_duration;
  rename study_id=id study_duration=duration;
run;

Result

Obs             id    flag    duration

  1        1096503      1        141
  2      737477001      0       1038
  3     1350295201      0       1069
  4     2661039302      0        640
  5    33596554301      0        415
  6       15101702      1        107
  7     2266821602      0        949
  8     2844664901      0       1078
  9     4272672001      0        810
 10    33155637601      0        712
 11       27264303      1         36
 12     1324920304      0         40
 13     3409397803      0        448
 14     3955249504      0        340
 15     4859531601      0        123

transposing IDs

Re: transposing IDs

Re: transposing IDs

Re: transposing IDs

Re: transposing IDs

transposing IDs

Re: transposing IDs

Re: transposing IDs

Re: transposing IDs

Re: transposing IDs

Registration is open

SAS Training: Just a Click Away