BookmarkSubscribeRSS Feed
cdunkleychan
Calcite | Level 5

The data I am working with is a very large dataset: the MIT Atlas data recording product trade by origin and destination country by year by product. I have combined it with CEPII data to determine how far apart each two countries are. My issue is this: I can successfully sum up exports by country to a given country, but my program does not do this for every country. The state of my current code gives me the sum of each country's exports to and imports from Zimbabwe, and stops there.

 

What I need: the exports to and imports from each country A and country B.

 

My current code:

data sasuser.want;
    set sasuser.have;
    by origin destination year;
    if First.destination then TotalExports = 0;
        TotalExports +exports;
         if Last.destination;
    if First.origin then TotalImports = 0;
        TotalImports + imports;
        if Last.origin;
    TotalTrade = TotalExports +TotalImports;
run;

What's going wrong here? Any help would be greatly appreciated.

 

I should note that I have successfully grabbed the data I want creating 2 separate datasets and then merging them. I am curious at this point about the feasability of doing this in a more concise manner.

6 REPLIES 6
Reeza
Super User

This is much easier to solve if you can provide sample data - fake is fine. If you make the example data and show the expected output you’ll definitely get a solution quickly.  

 

How big is your dataset? SAS is designed to handle large data - no possibly no issues there. 

And any particular reason you aren’t using PROC MEANS or FREQ to calculate summary stats. Seems like it would be easier. 

 

The middle section of this presentation goes over PROC MEANS

 

https://www.sas.com/content/dam/SAS/en_ca/User%20Group%20Presentations/Vancouver-User-Group/FareezaK...

 


@cdunkleychan wrote:

The data I am working with is a very large dataset: the MIT Atlas data recording product trade by origin and destination country by year by product. I have combined it with CEPII data to determine how far apart each two countries are. My issue is this: I can successfully sum up exports by country to a given country, but my program does not do this for every country. The state of my current code gives me the sum of each country's exports to and imports from Zimbabwe, and stops there.

 

What I need: the exports to and imports from each country A and country B.

 

My current code:

data sasuser.want;
    set sasuser.have;
    by origin destination year;
    if First.destination then TotalExports = 0;
        TotalExports +exports;
         if Last.destination;
    if First.origin then TotalImports = 0;
        TotalImports + imports;
        if Last.origin;
    TotalTrade = TotalExports +TotalImports;
run;

What's going wrong here? Any help would be greatly appreciated.

 

I should note that I have successfully grabbed the data I want creating 2 separate datasets and then merging them. I am curious at this point about the feasability of doing this in a more concise manner.


 

cdunkleychan
Calcite | Level 5

This slide deck doesn't really help me understand how to use indexing, but I see how it could be useful. It would be really hard to include my data here (143m rows). Below is a screenshot from SAS of what it looks like. I tried toying around with the other suggestion; I made no headway/

Trade dataset.png

Tom
Super User Tom
Super User

Pictures of your data are not really helpful.

But there is no need to post your REAL data. 

 

Just post something that can be use to describe your question.  Post example input and the desired output for that input. Include just enough complexity to demonstrate the issue.  You should post it in the form of a data step that others can run to create the problem.  

 

Or ask the question using one the sample datasets from SASHELP libref. Like SASHELP.CLASS or SASHELP.CARS.

 

You will probably figure out the answer yourself just from going through the steps of fully explaining what you are trying to do.

Reeza
Super User

@cdunkleychan wrote:

This slide deck doesn't really help me understand how to use indexing, but I see how it could be useful.

The middle section of this presentation goes over PROC MEANS

 

 

Astounding
PROC Star
The reason you are running into trouble is the sub setting IF statements:

if last.origin;
if last.destination;

When these conditions are false, the program stops processing the current observation. For example, when last.destination is false, all the subsequent statements get skipped. Rather than try to fix the problem, I would recommend using several steps. First, summarize the data:

proc summary data=have nway;
class origin destination;
var exports imports;
output out=totals (drop=_type_ _freq_) sum=:
run;

This gives you a much smaller data set that you can process many times over, without running up a bill.
Tom
Super User Tom
Super User

Sounds like your question is this:

What I need: the exports to and imports from each country A and country B.

 

So you want one dataset with ORIGIN, YEAR and TOTAL_EXPORTS and other with DESTINATION,YEAR and TOTAL_IMPORTS?

 

Does your input data have MPORT and EXPORT as two different fields?  

Does your input data have both exchange directions listed ( A->B and B->A) as separate observations?  or do you need to impute that exports from A to B are also imports to B from A?

 

 

 

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 1640 views
  • 0 likes
  • 4 in conversation