Solved: Transpose Data from long to wide; caveat: some inconsistent row data

UniversitySas · Posted 11-14-2018 09:44 PM

So my long data looks like this:

data source;
INFILE DATALINES dlm='#';
 length ID $ 3
 VAR $ 32
 CLASSIFICATION $ 32;
 input ID VAR CLASSIFICATION;
 DATALINES;
201 # PepsiCo # Name
201 # 111 main street # AddressLine1
201 # Philadelphia # City
201 # PA # State
201 # 21491 # Zipcode
201 # None # Relationship
201 # Charity # Status
201 # Help # Purpose
201 # 9001 # DollarAmount
201 # CocaCola # Name
201 # 245 Cork street # AddressLine1
201 # Floor 43 # AddressLine2
201 # Philadelphia # City
201 # PA # State
201 # 21492 # Zipcode
201 # None # Relationship
201 # Charity # Status
201 # Build Factory # Purpose
201 # 4100 # CurrentContribution
201 # 13101 # TotalAmount
331 # Adidas # Name
331 # 115 walnut avenue # AddressLine1
331 # New York # City
331 # NY # State
331 # 255191 # Zipcode
331 # Charity # Status
331 # Help # Purpose
331 # 143 # FutureContribution
331 #143 # TotalAmount
334 # Nike # Name
334 # 123 Stevens Road # ForeignAddressLine1
334 # apartment 4D # ForeignAddressLine2
334 # Denpasar # City
334 # Bali # State
334 # 2512 # Zipcode
334 # N/A # Relation
334 # Help # Purpose
334 # 2000 # CurrentContribution
334 # 2000 # TotalAmount
;
RUN;

The output I am after is this:

The basic structure of the data is this:

IDs appear multiple times, and they have names associated to them. The last time the ID appears, there is an additional entry called "total" which sums up the entire amount for "contribution" for that ID. Then the New ID starts.

Every "Name" has:

an "Address line 1". Sometimes it contains the string "Foreign", in which case I want the column that says "Foreign" to say "Y" and "N" otherwise.

There is sometimes also an "Address line 2", but NOT always. Regardless, I want to force an "Address line 2" into the data.

There is always a "city", "state" and "postcode".

Sometimes "Relation" is not there, so I want to force it there.

It should always comes 1 row after "postcode", and 1 row before "status"

Sometimes "status" is not there. It should always come two rows after "postcode" in the long data file. Needs to be forced too.

Sometimes "Purpose" is not there. It always comes three rows after "postcode" in the long data file. Needs to be forced too.

"Contribution Amount" is always there - but it either contains the string "current" or "future" in it. if it says "Future" I want the "Future" column to say "Y", if not then "N".

For rows that are sometimes not there, and have to be forced in; here I want the corresponding value to be a ".".

I actually made a similar post earlier - but the data at the time was IMPOSSIBLE to work with. By some stroke of luck, the raw data I received is singificantly improved in that it actually has the row names (even though they're not always listed). So I don't need to guess the structure of the data anymore.

Any help is seriously appreciated.

Thanks!

Kurt_Bremser · Posted 11-15-2018 03:37 AM

/* step 1: create a unique id for ever name */
/* also do some manipulations as needed */
data int1;
set source;
retain newid 0;
if classification = 'Name' then newid + 1;
if substr(classification,1,7) = 'Foreign'
then do;
  classification = substr(classification,8);
  output;
  if index(classification,'1')
  then do;
    classification = 'Foreign';
    var = 'Y';
    output;
  end;
end;
else output;
run;

/* step 2: transpose along the unique id */
proc transpose
  data=int1
  out=int2 (drop=_name_)
;
by newid;
var var;
id classification;
run;

/* step 3: get a reference table for unique id to id */
proc sort
  data=int1 (keep=id newid)
  out=int3
  nodupkey
;
by newid;
run;

/* step 4: merge the original id back in */
data want (drop=newid);
merge
  int3
  int2
;
by newid;
run;

This code handles the transpose. You can add additional code in step 4 to force the horizontal order of variables, or to create derived columns (eg the "future" column), and to set default values for missing columns.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

View solution in original post

Kurt_Bremser · Posted 11-15-2018 03:37 AM

/* step 1: create a unique id for ever name */
/* also do some manipulations as needed */
data int1;
set source;
retain newid 0;
if classification = 'Name' then newid + 1;
if substr(classification,1,7) = 'Foreign'
then do;
  classification = substr(classification,8);
  output;
  if index(classification,'1')
  then do;
    classification = 'Foreign';
    var = 'Y';
    output;
  end;
end;
else output;
run;

/* step 2: transpose along the unique id */
proc transpose
  data=int1
  out=int2 (drop=_name_)
;
by newid;
var var;
id classification;
run;

/* step 3: get a reference table for unique id to id */
proc sort
  data=int1 (keep=id newid)
  out=int3
  nodupkey
;
by newid;
run;

/* step 4: merge the original id back in */
data want (drop=newid);
merge
  int3
  int2
;
by newid;
run;

This code handles the transpose. You can add additional code in step 4 to force the horizontal order of variables, or to create derived columns (eg the "future" column), and to set default values for missing columns.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

UniversitySas · Posted 11-20-2018 02:02 PM

You are a legend! This worked BEAUTIFULLY.

Just one question, I had some issues in step 2 when trying to do the transpose. In my data, the values in classification are actually much longer, upto 60 characters, and the first 40 or so characters are the same.

The error message I get is:

NOTE: The above message was for the following BY group:
newid=12814
ERROR: The ID value "grantorcontriapprovedforfuture__" occurs twice in the same BY group.
ERROR: The ID value "grantorcontriapprovedforfuture__" occurs twice in the same BY group.
ERROR: The ID value "grantorcontriapprovedforfuture__" occurs twice in the same BY group.

My work-around was to hardcode a shorter string, but I had to do this for 20 different values.

Is there anything else I can do, so that it reads the string in it's entirety instead of the first 32 characters?

Thanks again so much for your help.

Kurt_Bremser · Posted 11-20-2018 05:43 PM

SAS variable names have a maximum length of 32, so you need to accomodate that with the values in the classification variable.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

UniversitySas · Posted 11-21-2018 06:34 AM

In other words, are you saying that renaming them to shorter character lengths is the only work around?

Kurt_Bremser · Posted 11-21-2018 06:50 AM

@UniversitySas wrote:
In other words, are you saying that renaming them to shorter character lengths is the only work around?

Absolutely. Anything that exceeds the maximum variable name length is truncated, and when that leads to ambiguities, you get those ERROR messages.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Transpose Data from long to wide; caveat: some inconsistent row data

Re: Transpose Data from long to wide; caveat: some inconsistent row data

Re: Transpose Data from long to wide; caveat: some inconsistent row data

Re: Transpose Data from long to wide; caveat: some inconsistent row data

Re: Transpose Data from long to wide; caveat: some inconsistent row data

Re: Transpose Data from long to wide; caveat: some inconsistent row data

Re: Transpose Data from long to wide; caveat: some inconsistent row data

SAS Innovate 2025: Call for Content

Classroom Training Available!