BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
UniversitySas
Quartz | Level 8

So, what I'm asking is not exactly a transpose (from my knowledge) and I'm finding it very tricky! Any help will be immensely appreciated.

 

I have some sample data:

DATA Unclean ;
 
INFILE DATALINES ;
 input ID VAR & $19. CLASSIFICATION $ ;
 DATALINES;
201  PepsiCo  XYZ  
201  111 main street  XYZ  
201  Charity  XYZ  
331  CocaCola  XYZ  
331  1823 unicorn street  XYZ  
331  Charity  XYZ  
331  Nike  XYZ  
331  123 brock avenue  XYZ  
331  Charity  XYZ  
331  Adidas  XYZ  
331  14 Market street  XYZ  
331  Charity  XYZ  
354  McDonalds  XYZ  
354  111 Obesity road  XYZ  
354  Charity  XYZ  
;
RUN;

 And my goal would be to get an output like this:

 

DATA CLEAN;
INFILE DATALINES ;
 input ID Name & $10. Address & $19. Classification $ ;
 DATALINES;
201  PepsiCo  111 main street	Charity 
331  CocaCola  1823 unicorn street  Charity   
331  Nike  123 brock avenue  Charity  
331  Adidas  14 Market street  Charity  
354  McDonalds  111 Obesity road  Charity  
;
RUN;

Specifically, there are a few challenges I am having.

1) The IDs have uneven companies attached to them

2) The value "XYZ" needs to be changed to reflect things like Name, Address, and Classification.
      Regarding 2) My fix has been that whenever we see an observation "Charity", the next observation is always a "Name".

Is there another way, more flexible way to approach this?

3) What would be the best function or command to achieve this?

 

Any help is very much appreciated!

1 ACCEPTED SOLUTION

Accepted Solutions
KachiM
Rhodochrosite | Level 12

You should have placed a SAMPLE Data Set that represent your real Data SET. I have given a solution based on your sample Data Set. 

 

Now you say there will be varying number of records - 2 records instead of 3. I hope you don't come with newer issues with your original Data Set. 

 

The solution is based on first creating triplets with adding "NONE" when "Charity" is missing.  Next the same program without using LAST.ID. We can simply check for either "NONE" or "Charity" to output a record at a time.

 

data source;
INFILE DATALINES ;
 input ID VAR & $19. CLASSIFICATION $ ;
 DATALINES;
201  PepsiCo  XYZ  
201  111 main street  XYZ  
201  Charity  XYZ  
331  CocaCola  XYZ  
331  1823 unicorn street  XYZ  
331  Charity  XYZ  
331  Nike  XYZ  
331  123 brock avenue  XYZ  
331  Charity  XYZ  
331  Adidas  XYZ  
331  14 Market street  XYZ  
331  Charity  XYZ  
354  McDonalds  XYZ  
354  111 Obesity road  XYZ  
354  Charity  XYZ  
354  KFC  XYZ  
354 111 Fat Avenue  XYZ  
441  Burger King  XYZ  
441  112 Skinny ROAD  XYZ
441  Adidas  XYZ
441  Socks Blvd  XYZ
441  Tiege Henley  XYZ
441  Skincare Road  XYZ
441  Charity  XYZ
;
RUN;
proc print data = source;
run;

data have;
   set source;
   if Var = "Charity" then i = 0;
   else i+1;
   if i = 3 and Var ^= "Charity" then do; 
      temp = Var; 
      Var = "NONE";  
      output;
      Var = temp;
      output;
      temp =' ';
      i = 1;
   end;
   else output;
drop temp i;
run;

proc print data = have;
run;

The Data Set, HAVE, will have triplets either with "NONE" or "Charity". 

 

The next the old program without LAST.ID checks.

 

data clean;
length ID 8 Name $10 Address $19 Classification $8 Var $19; 
   do i = 1 by 1 until(VAR in ("NONE","Charity"));
      set have;
      select(mod(i,3));
         when(1) Name=Var;
         when(2) Address = Var;
         when(0) Classification = Var;
         otherwise;
      end;
   end;
drop i Var;
run;
proc print data = clean;
run;

 

 

View solution in original post

20 REPLIES 20
Reeza
Super User
Was the original format JSON or XML?
And didn’t that have the tags to create the classification variables?
UniversitySas
Quartz | Level 8
The data I received was a sas file actually. It's extremely granular data, and from what I've been told the original data file wouldn't have what you're asking
Reeza
Super User
If it’s always this style with three lines it’s actually trivial. If you can have varying lines per company then it’s problematic.

Use MOD() to count lines.

Data have;
Set source;
X=mod(_n_, 3);
If x=1 then class=‘name’;
Else if class =2 then class = ‘address’;
Else if class=0 then class = ‘type’;

Retain record;
If x=1 record+1;
Run;

Proc transpose data=have out=want;
By Id record;
Var var;
Id class;
Run;

Untested.
VDD
Ammonite | Level 13 VDD
Ammonite | Level 13

this link should assist you with understanding how to transpose data:

 

https://stats.idre.ucla.edu/sas/modules/how-to-reshape-data-long-to-wide-using-proc-transpose/

 

It looks like the x = row 1, y = row 2, z = row 3

output your record.

start you count over for x,y.z;

 

basically looks like 3 records per final record.  

 

UniversitySas
Quartz | Level 8
I've had a look at that page, I couldn't really get anything to work.

Are you insinuating that I need to use a LOOP here?
VDD
Ammonite | Level 13 VDD
Ammonite | Level 13

yes you may need to hold record 1 and record 2 and output after obtaining record 3

VDD
Ammonite | Level 13 VDD
Ammonite | Level 13
proc transpose data=have out=want(drop=_name_);
   by id;
   var var;
run;

you will need to pass the var-names to the columns

UniversitySas
Quartz | Level 8

For now the adding the correct var names are not too important, so I can address that later. 

 

But using a PROC TRANSPOSE gives me something like this:

 

Capture.JPG

 

It fails to re-iterate values for ID 331, instead just lumping them into the same row.

VDD
Ammonite | Level 13 VDD
Ammonite | Level 13

you need to clean your data as @Reeza has noted above first.  Then use the proc transpose.

VDD
Ammonite | Level 13 VDD
Ammonite | Level 13

with cleaned up data based on @Reeza datastep

 

data source;
INFILE DATALINES ;
 input ID VAR & $19. CLASSIFICATION $ ;
 DATALINES;
201  PepsiCo  XYZ  
201  111 main street  XYZ  
201  Charity  XYZ  
331  CocaCola  XYZ  
331  1823 unicorn street  XYZ  
331  Charity  XYZ  
331  Nike  XYZ  
331  123 brock avenue  XYZ  
331  Charity  XYZ  
331  Adidas  XYZ  
331  14 Market street  XYZ  
331  Charity  XYZ  
354  McDonalds  XYZ  
354  111 Obesity road  XYZ  
354  Charity  XYZ  
;
RUN;


Data have;
length class $7;
Set source;
X=mod(_n_, 3);
If x=1 then class="name";
Else if x =2 then class = "address";
Else if x=0 then class = "type";

Retain record;
If x=1 then record+1;
Run;

Proc transpose data=have(drop=record _name_) out=want;
By Id record;
Var var;
Id class;
Run;

  

UniversitySas
Quartz | Level 8
This is perfect, thank you so much!
KachiM
Rhodochrosite | Level 12

I suppose you don't insist on PROC TRANSPOSE. In one Data Step you will get the required output. Assumed that the Data Set is sorted by ID.

The sequential records provide one of the three Variables  where LASTID can be used(except for ID = 331). For ID = 331, the alternate check with "Charity" breaks them into several records. The use of MOD(i,3) picks the relevant Value for Variables.

 

data clean;
length ID 8 Name $10 Address $19 Classification $8 Var $19; 
   do i = 1 by 1 until(last.id | Var = 'Charity');
      set unclean;
      by id notsorted;
  
      select(mod(i,3));
         when(1) Name=Var;
         when(2) Address = Var;
         when(0) Classification = Var;
         otherwise;
      end;
   end;
drop i Var;
run;
UniversitySas
Quartz | Level 8
This looks fantastic - still very new to SAS so i'm very keen on working through this code you posted - it looks extremely compact!
UniversitySas
Quartz | Level 8

Hey datasp,

 

So one thing I just realised was that the code is not quite flexible enough; I was wondering if you had a fix.

 

Some of my data is organised in a way that the first 1/4 might have the recurring pattern for 9 rows, the 2nd quarter for 8 rows, the third quarter for 9 rows again and the final quarter for 7 rows.


I understand it is a little vague since I don't have a sample of the dataset here (which I can provide if it's needed), but do you have an alternate strategy for the case above?

 

Would it my best bet be to break the data into separate pieces and then merge it again later, or is there another way?

 

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 16. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 20 replies
  • 1800 views
  • 0 likes
  • 5 in conversation