BookmarkSubscribeRSS Feed
BlueNose
Quartz | Level 8

Hello all,

 

I have a taks to perform, and the only thing I managed to get working, is long and unefficient.

 

I have a datase with clinical laboratory results, along with reference values, the data has the following variables:

 

  • Subject ID
  • Visit (ranging from 1-4, i.e. each subject has up to 4 rows in this data)
  • Treatment group (1/0)
  • Test 1 (Numeric)
  • Test 2 (Numeric)
  • Test 3 (Numeric)
  • Test 1 Low reference value (numeric)
  • Test 1 High reference value (numeric)
  • Test 2 Low reference value (numeric)
  • Test 2 High reference value (numeric)
  • Test 3 Low reference value (numeric)
  • Test 3 High reference value (numeric)

My final goal is to present descriptive statistics for the numeric tests and frequency distributions of the tests within range.

 

I want to do the analysis with the "by" statment, to save code (write it once).

 

My problem is to bring the data to the shape I need. I do not want to calculate the within normal range variable 3 times, I want it only once, but I can't seem to manage that. So I did it "widthly" which means copy+paste. The if statment is long (4 conditions) and copying it 3 times (what if I'll have 13?) is prone to bugs. How can I create a data, with each row corresponding to a subject*visit*treatment*test, with the normality status matching the right test?

 

Thank you in advance !

 

Comment: Clearly I can't add data, but feel free to type any numbers you wish, the principle matters, not the numbers.

5 REPLIES 5
RW9
Diamond | Level 26 RW9
Diamond | Level 26

If you search here, or on the internet for "reshape data" or "long to wide" you will find many posts on the subject, with various solutions from macro code, arrays etc.  Without seeing an example of what your data looks like, or what you want the output to look like I can't provide code though.

BlueNose
Quartz | Level 8

the data looks like this:

 

data A;

input ID Visit Group Test1 Test2 Test3 Test1_L Test1_H Test2_L Test2_H Test3_L Test3_H;

datalines;

1 1 0 5 3 6.7 1 10 2 7 3 9

1 2 0 5.5 3.8 8.7 1 10 2 7 3 9

1 3 0 4.5 2.8 5.7 1 10 2 7 3 9

2 1 1 .......

2 2 1 ......

2 3 1 ......

;

 

What I want at the end is that each row will represent one subject, in one visit, for one single test, I want the row to include the value of the test, and a character variable: "Normal" or "Not normal" according to the reference values (low and high) of the test. And, i want to keep the group of course. Test1 is normal if it is within Test1_L and Test1_H, and so on.

 

I tried looking, couldn't find what I need.

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Assuming the prefix is constant:

data A;
  input ID Visit Group Test1 Test2 Test3 Test1_L Test1_H Test2_L Test2_H Test3_L Test3_H;
datalines;
1 1 0 5 3 6.7 1 10 2 7 3 9
1 2 0 5.5 3.8 8.7 1 10 2 7 3 9
1 3 0 4.5 2.8 5.7 1 10 2 7 3 9
;
run;

proc transpose data=a out=want;
  by id visit group;
  var test:;
run;

Will transpose the data down.  

 

For your other question, regarding the ranges, you have really made a rod for your own back there with that layout.  TestX is fine, but because you have not put the incrementor at the end of test_hX for instance, its harder to reference them.

data A;
  input ID Visit Group Test1 Test2 Test3 Test1_L Test1_H Test2_L Test2_H Test3_L Test3_H;
datalines;
1 1 0 5 3 6.7 1 10 2 7 3 9
1 2 0 5.5 3.8 8.7 1 10 2 7 3 9
1 3 0 4.5 2.8 5.7 1 10 2 7 3 9
;
run;

data b;
  set a;
  array test{3};
  array test_h{3} test1_h test2_h test3_h;
  array test_l{3} test1_l test2_l test3_l;
  array test_result{3} $50;
  do i=1 to 3;
    if test_l{i} <= test{i} <= test_h{i} then test_result{i}="Normal";
    else test_result{i}="Not Normal";
  end;
run;


proc transpose data=b out=want;
  by id visit group;
  var test:;
run;
data_null__
Jade | Level 19

This method transposes the LAB test and makes no assumption about the name (i.e. TEST1) only that the high and low range values are identified by adding _H or _L to the root name.  I will leave it to you to calculated the range flag.  

 

data A;
   input ID Visit Group Test1 Test2 Test3 Test1_L Test1_H Test2_L Test2_H Test3_L Test3_H;
   datalines;
1 1 0 5 3 6.7 1 10 2 7 3 9
1 2 0 5.5 3.8 8.7 1 10 2 7 3 9
1 3 0 4.5 2.8 5.7 1 10 2 7 3 9
2 1 1 5 3 6.7 1 10 2 7 3 9
2 2 1 5.5 3.8 8.7 1 10 2 7 3 9
2 3 1 4.5 2.8 5.7 1 10 2 7 3 9
;;;;
   run;
proc print;
   run;
proc transpose data=a out=b;
   by id visit group;
   run;
data b;
   set b;
   length paramcd $8 namecd $1;
   call scan(_name_,1,p,l,'_');
   paramcd = substrn(_name_,p,l);
   namecd  = coalesceC(substrn(_name_,p+l+1),'V');
   drop p l _name_;
   run;
proc sort data=b;
   by id visit group paramcd;
   run;
proc format;
   value $namecd 'V'='Value' 'H'='High' 'L'='Low';
   run;
proc transpose data=b out=c(drop=_name_);
   by id visit group paramcd;
   id namecd;
   format namecd $namecd.;
   var col1;
   run;
proc print;
   run;

Capture.PNG

rogerjdeangelis
Barite | Level 11

see

 

https://goo.gl/05M2EO

 

you can have multiple variables on the id statement

 

 

 

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 3761 views
  • 0 likes
  • 4 in conversation