BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
ADouglas
Obsidian | Level 7

Hi All,

 

I have a contrived data set that has 9 fields of responses to examination questions. The questions have unique ids of 1, 2, 3, 4, 5, 6, 7, 8 & 9.

 

The 9 questions are grouped into blocks that are smaller modules of 3 questions. There are 3 blocks/modules of questions. The block/module ids are A, B, & C.

 

The first record of the data set includes the question ids, and the second record of the data set includes the block ids.

 

The information based on the first two records shows the specific questions grouped into each block/module.

 

The remaining records include examinees' responses to the questions, with 1 being answered correctly and 0 being answered incorrectly. 

 

This is how the data set looks (Data Have):

 

1  2 3  4 5 6  7 8  9

A A A B B B C C C

1  0 1  0 1 0  1 0  1 

0  1 1  1 0 0  1 1  0

1  1 1  1 0 0  0 0  1

Etc.

 

What I want to do is to create three separate data sets.

 

I want a data set that has only the questions in block/module A. 

 

I want a data set that has only the questions in block/module B. 

 

I want a data set that has only the questions in block/module C. 

 

This is what I want (Data Want):

 

Output data set 1.

 

1  2 3  

A A A  

1  0 1  

0  1 1  

1  1 1  

Etc.

 

Output data set 2.

 

4 5 6

B B B   

0 1 0  

1 0 0  

1 0 0  

Etc.

 

Output data set 3.

 

7 8  9

C C C

1 0  1 

1 1  0

0 0  1

Etc.

 

I think I could transpose the original data set, and work with it to select records for each block/module to create three output files, and then transpose those output files back so that the block/module id is in the first record for each of the three files. But, that would require a number of transpose steps.

 

I am wondering if there is code that would be more concise & elegant that I can use in one DATA step that can be used together with a SAS macro. I need the code to generalize to a more realistic situation of starting with a data set that has 3000 fields of responses to exam questions grouped into 90 blocks/modules.

 

All I am hoping is that someone from the community would have a pretty decent idea of how to program the code for the single DATA step. I can then finish the code for the macro so that different block/module ids are specified and the DATA step can be executed for each block/module. 

 

Any ideas?

 

Thanks!  

1 ACCEPTED SOLUTION

Accepted Solutions
novinosrin
Tourmaline | Level 20

Hi @ADouglas  Have fun with hashes

 

data have;
input A1 A2 A3 B1 B2 B3 C1 C2 C3;
cards;
1  2 3  4 5 6  7 8  9
1  0 1  0 1 0  1 0  1 
0  1 1  1 0 0  1 1  0
1  1 1  1 0 0  0 0  1
;
dm log 'clear';

data _null_ ;
  if _n_ = 1 then do ;
    declare hash h ;   
    declare hash hoh() ;  
    hoh.defineKey ("j") ;
    hoh.defineData ("h", "j") ;
	   hoh.defineDone () ;
	end ;
set have end = lr ;  
array t(*) a1--c3;
do i= 1 to dim(t);
k=mod(i,3);
if k=1 then 
	do;
		j=first(vname(t(i)));
		if hoh.find() ne 0 then 
		do ;  
   			 h = _new_ hash (multidata:"Y",ordered:'y') ;  
			 h.defineKey ('_n_') ; 
		     do  k1=i to i+2;
	    		h.defineData (vname(t(k1))) ;
			end;
			h.defineDone () ;
		    hoh.add() ; 
	   end; 
	  h.add(key:_n_, data:t(i),data:t(i+1),data:t(i+2)) ;  
   end ;
end;
if lr ;
  declare hiter ihoh ("hoh") ;  
  do while (ihoh.next() = 0) ;
     h.output (dataset:j) ;
  end ;
run ;

NOTE: If memory is an issue, i can't help. A college student's help can go thus far only with the limited resources I have.

View solution in original post

22 REPLIES 22
PeterClemmensen
Tourmaline | Level 20

How is the data stored? A SAS data set can not have 3 variables with the name A.

ADouglas
Obsidian | Level 7

Thanks for asking the question, and clarifying a detail I overlooked. 

 

Let's say record 1 has the question ids, and record 2 has the block/module ids. 

 

Data I have:

 

1  2 3  4 5 6  7 8  9

A A A B B B C C C

1  0 1  0 1 0  1 0  1 

0  1 1  1 0 0  1 1  0

1  1 1  1 0 0  0 0  1

Etc.

 

Data I want. 

 

Output data set 1.

 

1  2 3 

A A A 

1  0 1  

0  1 1  

1  1 1  

Etc.

 

Output data set 2.

 

4 5 6  

B B B 

0 1 0  

1 0 0  

1 0 0  

Etc.

 

Output data set 3.

 

7 8  9

C C C

1 0  1 

1 1  0

0 0  1

Etc.

Reeza
Super User
Is your data in SAS data sets already or in text files to start with?
ADouglas
Obsidian | Level 7

The data will come to me as a .csv file. 

 

As per Draycut's question, I can have the IT professional store the block/module ids as the 2nd record, and the item ids as the first record, if need be?

 

Thanks

Reeza
Super User

Will you always have that many columns or is that an example? Is this a more than one time process or one time?

 

I'd probably build a dynamic process, first read the first two records, determine how many you need and then read the rest. But it depends on how generic it needs to be. Not really sure why you need to split it either. 

 


@ADouglas wrote:

The data will come to me as a .csv file. 

 

As per Draycut's question, I can have the IT professional store the block/module ids as the 2nd record, and the item ids as the first record, if need be?

 

Thanks


 

ADouglas
Obsidian | Level 7

I receive semi-annually a data set with over 3000 variables. The code needs to be executed each time. 

 

The blocks/modules of questions need to be put into separate output files, and I have to conduct analysis for each file.  

 

Thanks,

Aaron

novinosrin
Tourmaline | Level 20

Hi @ADouglas  Have fun with hashes

 

data have;
input A1 A2 A3 B1 B2 B3 C1 C2 C3;
cards;
1  2 3  4 5 6  7 8  9
1  0 1  0 1 0  1 0  1 
0  1 1  1 0 0  1 1  0
1  1 1  1 0 0  0 0  1
;
dm log 'clear';

data _null_ ;
  if _n_ = 1 then do ;
    declare hash h ;   
    declare hash hoh() ;  
    hoh.defineKey ("j") ;
    hoh.defineData ("h", "j") ;
	   hoh.defineDone () ;
	end ;
set have end = lr ;  
array t(*) a1--c3;
do i= 1 to dim(t);
k=mod(i,3);
if k=1 then 
	do;
		j=first(vname(t(i)));
		if hoh.find() ne 0 then 
		do ;  
   			 h = _new_ hash (multidata:"Y",ordered:'y') ;  
			 h.defineKey ('_n_') ; 
		     do  k1=i to i+2;
	    		h.defineData (vname(t(k1))) ;
			end;
			h.defineDone () ;
		    hoh.add() ; 
	   end; 
	  h.add(key:_n_, data:t(i),data:t(i+1),data:t(i+2)) ;  
   end ;
end;
if lr ;
  declare hiter ihoh ("hoh") ;  
  do while (ihoh.next() = 0) ;
     h.output (dataset:j) ;
  end ;
run ;

NOTE: If memory is an issue, i can't help. A college student's help can go thus far only with the limited resources I have.

ADouglas
Obsidian | Level 7

This works, bravo bravo.

 

However, I will need to study the code to understand it, and I'm not sure how fast it will be when there are 3000 plus variables and 90 plus blocks/modules. 

 

But, nonetheless, I'm sure it will come in handy.

 

The solution does require a change for inputting the data, but nonetheless, it get's the job done.  

 

 

Thanks!!

novinosrin
Tourmaline | Level 20

Hello @ADouglas  If you could bear with me until tomorrow. I can possibly effiiciently test at my college lab. Today it's a holiday here in Chicago and I would only go back to Depaul lab tomorrow morning.

 

So, if it's not urgent, I can efficiently test with rich samples and let you know or even make changes. Just  a one caveat, I do have a presentation on a case study tomorrow that I hope to pass as I haven't even started yet lol, so the timing of my response may not be consistent. 

 

Nonetheless, We the community(you included) are a SAS family. Feel free to reach out to us. 

 

PS I will explain how the code works tomorrow. Today I am little too busy preparing for the case study. Bear with me sir!

ADouglas
Obsidian | Level 7

Please finish your case study; and only if you have time, followup with me here. 

 

There is only one issue with your code, I think. 

 

The number of modules can very year to year; and for a given year, the number of questions between modules varies. 

 

My example didn't show these complexities. But, what you've done will work for a set of modules that have a fixed number of questions. I could very well be presented with such a scenario for another project. 

 

I applaud your interest and motivation to respond while being a college student. 

 

Take Care

novinosrin
Tourmaline | Level 20

Pleasure will be mine. In the mean time, can you please post a "comprehensive sample" explaining your need, so that we can avoid going back and forth. I could modify the code in one shot. 

ADouglas
Obsidian | Level 7

Hi Novinosrin,

 

As you requested, below is a modification to the example that shows how the number of questions between modules vary.

 

Data I have:

 

1  2 3  4 5 6  7 8  9

A A A A B B C C C

1  0 1  0 1 0  1 0  1 

0  1 1  1 0 0  1 1  0 

1  1 1  1 0 0  0 0  1 

Etc.

 

Data I want. 

 

Output data set 1.

 

1  2 3  4 

A A A A 

1  0 1  0 

0  1 1  1 

1  1 1  1  

Etc.

 

Output data set 2.

 

 

 

1  2 

B B

1 0  

0 0  

0 0  

Etc.

 

Output data set 3.

 

 7 8  9

 C C C

 1 0  1 

 1 1  0 

 0 0  1 

Etc.

 

But do your homework first !!

 

Thanks

 

Reeza
Super User
Do you actually have the 123456789 in your data set or are you adding that for clarity?

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 22 replies
  • 1682 views
  • 10 likes
  • 4 in conversation