BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Pechan_Uvas
Calcite | Level 5

Hey there, spending way too much time trying to find a workaround to this issue but is time to ask for support.

I created a macro that transposed data from a wide format (variables names in columns, values in rows) to a long one (variable name in one column, values as columns per year and sequence). It works very well when values are unique, but failed for some datasets as there are multiple observations for same ID and YEAR_SEQUENCE. It does not really failed but got unwanted results. This is due to the LET function in PROC TRANSPOSE that allows the macro to continue (with warnings) but the desired output is not what I need as only the last row per ID is outputted. 

 

The main goal is to flag when there was a change in data values between the sequence numbers (e.g. 1 if there was a change between sequence 1 and sequence 2, or sequence 2 and sequence 3).

 

This is a dummy example:

As you can see, for ID 2, Var_A has values STEP_A and STEP_B for same period (year and sequence), likewise VAR_B has two values for same period.  

Any help will be pretty much appreciated. 

 

data table; 
input ID $ ACC_DATE SEQUENCE VAR_A $ VAR_B  ; 
datalines; 
1 202201 1 STEP_B 350 
1 202201 2 STEP_B 500 
1 202201 3 STEP_B 700 
2 202201 1 STEP_A 20 
2 202201 1 STEP_B 5 
2 202201 2 STEP_A 20 
2 202201 2 STEP_B 5 
2 202201 3 STEP_A 20 
2 202201 3 STEP_B 5 ; run; 

PROC SORT DATA= TABLE FORCE; BY ID ACC_DATE SEQUENCE  ; RUN; 

proc transpose data = work.Table out = work.Table_T1 name= VARIABLE  PREFIX=_ DELIM=WD let ; 
by ID; var var_a var_b; 
id ACC_DATE SEQUENCE; run;

 

 

want

IDVariable_202201_1_202201_2_202201_3
1VAR_ASTEP_BSTEP_BSTEP_B
1VAR_B350500700
2VAR_ASTEP_ASTEP_ASTEP_A
2VAR_ASTEP_BSTEP_BSTEP_B
2VAR_B204080
2VAR_B51010
1 ACCEPTED SOLUTION

Accepted Solutions
DerylHollick
SAS Employee

Should the values of VAR_B in the last 4 rows of the initial data table match those in your "want" table?

 

If so, then the only way I could think to do this (other than transposing with the DATA step) is to add an additional SEQ_CNT variable for proc transpose to use as a BY variable.

data table; 
input ID $ ACC_DATE SEQUENCE VAR_A $ VAR_B  ; 
datalines; 
1 202201 1 STEP_B 350
1 202201 2 STEP_B 500
1 202201 3 STEP_B 700 
2 202201 1 STEP_A 20
2 202201 1 STEP_B 5
2 202201 2 STEP_A 40
2 202201 2 STEP_B 10
2 202201 3 STEP_A 80 
2 202201 3 STEP_B 10
; 
run; 

PROC SORT DATA= TABLE FORCE; BY ID ACC_DATE SEQUENCE; RUN;

* create SEQ_CNT variable;
data table;
   set table;
   by ID ACC_DATE SEQUENCE;
   if first.SEQUENCE then SEQ_CNT=0;
   SEQ_CNT+1;
run;

PROC SORT DATA= TABLE FORCE; BY ID SEQ_CNT; RUN;

proc transpose data=work.Table out=work.Table_T1 (drop=SEQ_CNT) name=VARIABLE  PREFIX=_ DELIM=_ let; 
   by ID SEQ_CNT; 
   var var_a var_b; 
   id ACC_DATE SEQUENCE; 
run;

proc sort data=Table_T1;
   by ID VARIABLE;
run;

want.JPG

View solution in original post

6 REPLIES 6
DerylHollick
SAS Employee

Should the values of VAR_B in the last 4 rows of the initial data table match those in your "want" table?

 

If so, then the only way I could think to do this (other than transposing with the DATA step) is to add an additional SEQ_CNT variable for proc transpose to use as a BY variable.

data table; 
input ID $ ACC_DATE SEQUENCE VAR_A $ VAR_B  ; 
datalines; 
1 202201 1 STEP_B 350
1 202201 2 STEP_B 500
1 202201 3 STEP_B 700 
2 202201 1 STEP_A 20
2 202201 1 STEP_B 5
2 202201 2 STEP_A 40
2 202201 2 STEP_B 10
2 202201 3 STEP_A 80 
2 202201 3 STEP_B 10
; 
run; 

PROC SORT DATA= TABLE FORCE; BY ID ACC_DATE SEQUENCE; RUN;

* create SEQ_CNT variable;
data table;
   set table;
   by ID ACC_DATE SEQUENCE;
   if first.SEQUENCE then SEQ_CNT=0;
   SEQ_CNT+1;
run;

PROC SORT DATA= TABLE FORCE; BY ID SEQ_CNT; RUN;

proc transpose data=work.Table out=work.Table_T1 (drop=SEQ_CNT) name=VARIABLE  PREFIX=_ DELIM=_ let; 
   by ID SEQ_CNT; 
   var var_a var_b; 
   id ACC_DATE SEQUENCE; 
run;

proc sort data=Table_T1;
   by ID VARIABLE;
run;

want.JPG

ballardw
Super User

@Pechan_Uvas You should be aware that when you provide data as data step with Datalines or Cards that if you place the semicolon that ends the data block on a line with data that the line with the semicolon is not in the resulting dataset.

So please make sure that your data steps look like:

data table; 
input ID $ ACC_DATE SEQUENCE VAR_A $ VAR_B  ; 
datalines; 
1 202201 1 STEP_B 350 
1 202201 2 STEP_B 500 
1 202201 3 STEP_B 700 
2 202201 1 STEP_A 20 
2 202201 1 STEP_B 5 
2 202201 2 STEP_A 20 
2 202201 2 STEP_B 5 
2 202201 3 STEP_A 20 
2 202201 3 STEP_B 5 
;  

Note that the Run is not needed.

Tom
Super User Tom
Super User

If you want to generate multiple observations per ID then you need some other variable that can distinguish the observations that go together.

 

data for_transpose ;
  set table;
  by id acc_date sequence ;
  row+1;
  if first.sequence then row=1;
run;

proc sort;
  by id row;
run;

proc transpose data = for_transpose out = Table_T1 name= VARIABLE  DELIM=_  prefix=X_; 
  by ID row;
  id ACC_DATE SEQUENCE; 
  var var_a var_b; 
run;

proc sort;
  by id variable row;
run;

proc print data=table_t1;
run;

Tom_0-1679620019668.png

 

 

 

Pechan_Uvas
Calcite | Level 5

This solution also works very well. Never seen the use of row+1 in a BY statement. Thanks for proposing a different approach.

Tom
Super User Tom
Super User

@Pechan_Uvas wrote:

This solution also works very well. Never seen the use of row+1 in a BY statement. Thanks for proposing a different approach.


ROW+1;

Is a SUM statement.  It has nothing directly to do with the BY statement.  It is the IF statement that is using the BY statement because it is referencing the FIRST. variable created by the BY statement.

PaigeMiller
Diamond | Level 26

@Pechan_Uvas wrote:

 

The main goal is to flag when there was a change in data values between the sequence numbers (e.g. 1 if there was a change between sequence 1 and sequence 2, or sequence 2 and sequence 3).

 

data table; 
input ID $ ACC_DATE SEQUENCE VAR_A $ VAR_B  ; 
datalines; 
1 202201 1 STEP_B 350 
1 202201 2 STEP_B 500 
1 202201 3 STEP_B 700 
2 202201 1 STEP_A 20 
2 202201 1 STEP_B 5 
2 202201 2 STEP_A 20 
2 202201 2 STEP_B 5 
2 202201 3 STEP_A 20 
2 202201 3 STEP_B 5 ; run; 

PROC SORT DATA= TABLE FORCE; BY ID ACC_DATE SEQUENCE  ; RUN; 

proc transpose data = work.Table out = work.Table_T1 name= VARIABLE  PREFIX=_ DELIM=WD let ; 
by ID; var var_a var_b; 
id ACC_DATE SEQUENCE; run;

 

 

want

ID Variable _202201_1 _202201_2 _202201_3
1 VAR_A STEP_B STEP_B STEP_B
1 VAR_B 350 500 700
2 VAR_A STEP_A STEP_A STEP_A
2 VAR_A STEP_B STEP_B STEP_B
2 VAR_B 20 40 80
2 VAR_B 5 10 10

If your goal is to flag data values being different, transposing is not needed (and results in a horrible and difficult to use formatting of the table). You can use the LAG function in data step, without transposing, to obtain flags when data values are different.

--
Paige Miller

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

Autotuning Deep Learning Models Using SAS

Follow along as SAS’ Robert Blanchard explains three aspects of autotuning in a deep learning context: globalized search, localized search and an in parallel method using SAS.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 6 replies
  • 1725 views
  • 5 likes
  • 5 in conversation