Jobs

Data Science Syllabus for Lawyers 

   

Lesson 1: Getting Started with R  

 

Basic Calculations in R  

Data types and functions  

Accessing and manipulating data  

Plotting data  

Basic string operations  

Loops  

  

Lesson 2: Web Scraping and Data Upload  

 

Setting a Working Directory  
Installing Packages  
Loading and saving CSV files  
Upload text files  
Read and upload PDFs  
Web scraping  
Working with XML data  

  

Lesson 3: Regular Expressions In legal data science, regexes serve two basic purposes.  

 

Document Segmentation:  For some applications, we want to work with parts of a document rather than the entire text. It is thus useful to segment contracts or treaties into constituent articles. Regexes help with that.  

Informational Retrieval: In other contexts, we use regexes to identify and extract the information we are interested in. For example, we could extract all the dates, email addresses, or citations in a document. Again, regexes help us accomplish this.  

1. What is a Regex?  
2. Integrating Regexes into R Code  
3. Using Regexes for Text Segmentation  
4. Using Regexes for Information Retrieval  

  

Lesson 4: Citation Networks and Legal Network Analysis  

 

Using regexes to find citations  

Creating a citation list  

Finding the most cited cases  

Visualizing networks  

Network measures  

  

Lesson 5: Dictionary Analysis (text as data analysis)  

 

Creation of a Text Corpus and Text Pre-processing  

Creating a Term-document Matrix  

Visualizing Word Frequency  

Working with Bigrams  

Dictionary Approach I: Term mapping  

Dictionary Approach II: Sentiment Analysis  

  

Lesson 6: Similarity  

 

Preparing Metadata  
Creating a Document-Term Matrix  
Creating a Similarity Matrix  
Visualizing Similarity Through Heatmaps  

  

Lesson 7: Automated content analysis through machine learning  

 

Unsupervised Machine Learning  
Supervised Machine Learning  

  

Lesson 8: Prediction  

 

Loading WJ Brennan Voting  

Prediction Using Naive Bayes  

Prediction Using Support Vector Machines  

Prediction Using K-Nearest Neighbour