BookmarkSubscribeRSS Feed

I am NOT for Sale: Analyzing Human Trafficking Data

Started ‎01-31-2020 by
Modified ‎01-31-2020 by
Views 3,927

The International Labour Organization estimates that forced labor and human trafficking is a $150 billion industry worldwide. The  Counter-Trafficking Data Collaborative (CTDC) has identified 172 countries where exploitation occurs and currently, the CTDC has 91,416 individual cases. This multi-billion-dollar industry denies freedom to 24.9 million people around the world. That’s one human too many.

There are 40.3 million victims of human trafficking, globally, at any time. 

For me, human trafficking is a topic that hits close to home, so I decided to put my love of programming to use and take a creative approach to find and analyze data.

 

The CDTC has a global data set that you can download directly from their website. Use the HTTP procedure to invoke a web service, such as the CTDC website, to issue a request to download the CSV file. 

 

filename ctdc temp;

proc http
url="https://www.ctdatacollaborative.org/sites/default/file/The%20Global%20Dataset%203%20Sept%202018.csv"
method="GET"
out=ctdc;
run;


Then, I used the IMPORT procedure to import the external CSV data file (ctdc) to a SAS data set, Work.HumanTrafficking. 

 

proc import file=ctdc
out=work.humantrafficking
dbms=csv
replace;
getnames=YES;
guessingrows=MAX;
run;

Once I import the CSV file and can access the data set, it's time to clean up the data. Use conditional statements, such as IF-THEN statements to clean up the data. In this example, I used the IF-THEN statements to reassign the values of -99 to Unknown. The IMPORT procedure assigns formats and informats during the import of the CSV file. Use the FORMAT and INFORMAT statement to remove all assigned formats and informats. 

 

data work.trafficking;
set work.humantrafficking;
if gender = "-99" then gender ='Unknown';
if ageBroad = "-99" then ageBroad = 'Unknown';
format _all_;
informat _all_;
run;

 

TIP: You can use the CONTENTS procedure to verify that your informats and formats have been removed. 

 

Now, I can begin analyzing the data that I have in front of me. I wanted to find out the number of Males and Females that are trafficked globally, based on the CTDC data set. Keep in mind that the data sets that are available are often not fully complete because a lot of the times some of these crimes are not reported.

Use the FREQ procedure to determine the frequency count for Male, Female, and Unknown. 

proc freq data=work.trafficking;
   tables Gender / norow nocum nocol;
run;

 

I used the NOROW, NOCUM, and NOCOL options to remove the row frequency, cumulative frequency, and column frequency.

 

Output: FREQ Procedure

freq-1-output.png

My analysis shows that globally, approximately 73% of trafficked victims are females, and 26% are males. 


I wanted to use this information to find out the distribution of gender with age ranges, for example how many male victims are identified in the 18-20 age range? Use the TABLES statement in the FREQ procedure to create a two-way table. 

 

proc freq data=work.trafficking;
   tables Gender*ageBroad / norow nocum nocol;
run;

 

Output: FREQ Procedure

 

freq-2-output.png

Based on my analysis, the most trafficked age range for females is 9-17 while for males it is 30-38. 

 

Use the SGPLOT procedure to visualize the data. 

 

title 'Trafficking Victims By Age Range';

proc sgplot data=work.trafficking;
   vbar ageBroad / stat=freq group=Gender ;
run;
quit;

 

Output: Trafficking Victims by Age Range (SGPLOT)

human-trafficking-blog-chart-1.png

 

 

While looking at this data and just seeing how global the human trafficking issue truly is, I was curious to see what types of exploitation occurs the most in what parts of the world. To do so, I used DATA step to prepare my data first. Use the RENAME statement to rename the variable CountryofExploitation to ID. I also used a conditional statement (IF-THEN) to reassign the -99 value for the variable, typeofExploitConcatenated to Other. 

 

data work.maptrafficking;
   set work.trafficking;
   rename CountryofExploitation=ID;
   if typeofExploitConcatenated='-99' then typeofExploitConcatenated='Other';
run;

Now, use the SGMAP procedure to create a world map using typeofExploitConcatenated variable as the variable to populate the map with. 

 

title1 "Trafficking by Country";

proc sgmap maprespdata=maptrafficking mapdata=mapsgfk.world;
   choromap typeOfExploitConcatenated / id=ID;
run;
quit;

Output: Trafficking by Country (SGMAP) 

 

human-trafficking-by-country-1.png

 

Based on this visualization you can see that in North America most of the exploitation that occurs, based on the CTDC data, is for sexual exploitation.


Human trafficking is not just something that happens in faraway countries-it is in our own backyards. In my home state of North Carolina, one of the biggest hubs for human trafficking in Charlotte, NC. That’s only 3 hours away from where I live.

 

I’m writing this post to draw attention to how technologies such as machine learning and AI can help foster a change in the world. We can grow towards a society that protects everyone and works to eradicate such awful industries where we exploit people for personal gain.

 

If you are eager to learn more about how we can use data analytics to solve a global problem such as human trafficking, check out the following blogs:

Recognizing potential red flags and knowing the indicators of human trafficking is a key step in identifying more victims and helping them find the assistance they need. To request help or report suspected human trafficking, call the National Human Trafficking Hotline at 1-888-373-7888 or text "help" to BeFree (233733).

 

References:

 

Counter-Trafficking Data Collaborative (CTDC). (n.d.). Retrieved January 29, 2020, from https://www.ctdatacollaborative.org/download-global-dataset.

 

Polaris (n.d.). Retrieved January 29, 2020 from https://polarisproject.org

Comments

Great article and great topic. Human trafficking is growing.

The graph Trafficking Victims by Age Range (SGPLOT) is not properly sorted though, which I find awful from a reporting viewpoint.

Also, I would have left the countries without data in grey rather than erasing them from the map.

Version history
Last update:
‎01-31-2020 01:34 AM
Updated by:
Contributors

sas-innovate-2024.png

📢

ANNOUNCEMENT

The early bird rate has been extended! Register by March 18 for just $695 - $100 off the standard rate.

 

Check out the agenda and get ready for a jam-packed event featuring workshops, super demos, breakout sessions, roundtables, inspiring keynotes and incredible networking events. 

 

Register now!

Free course: Data Literacy Essentials

Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning  and boost your career prospects.

Get Started

Article Tags