Fedspeak? Or just another Bag of Words?

TL DR: GitHub Repo and FedTools package.

This article introduces the “FedTools” Python package, providing a practical implementation of a basic Bag of Words algorithm.

So, what is Fedspeak?

“Fedspeak”, otherwise known as “Greenspeak”, was initially termed by Alan Blinder to describe the “turgid dialect of English” used by Federal Reserve Board chairpeople when making vague, noncommittal or ambiguous statements. Over recent years, Federal Reserve policy communications have evolved dramatically, owing to increases in natural language processing (NLP) capabilities of Financial Institutions world over.

Natural Language Processing

Natural Language Processing (NLP) is a field of artificial intelligence enabling machines to interact with, analyse, understand and interpret the meaning of human language. NLP has a number of sub-fields, such as automated summarization, automated translation, named entity recognition, relationship extraction, speech recognition, topic segmentation and sentiment analysis. This article focuses on implementing a basic sentiment analysis through the use of a “Bag of Words” (BoW) algorithm. The BoW algorithm is useful for extracting features from text documents, which can be subsequently incorporated into modelling pipelines.

Bag of Words

The BoW approach is very simple and can be implemented on different document types in order to extract pre-defined features from documents. At a high-level, the Bag of Words is a representation of text which describes the occurrence of a pre-determined set of words within a document. This is characterized by two steps:

1) A vocabulary or ‘dictionary’ of pre-determined words must be chosen. 2) The presence of the known words is measured. This is known as a “bag”, as all information about word order is discarded. The model only considers the number of occurrences of the pre-determined words within the text.

Practical Implementation

Now we have outlined the BoW algorithm, we can implement this in 7 easy steps. The first step is to install packages and modules which we shall use:

# install plotly (chart-studio):
!pip install chart-studio

# for data manipulation:
import numpy as np
import pandas as pd
import datetime as dt

# for textual analysis:
import nltk
import re
nltk.download('punkt')

# silence warnings:
import warnings 
warnings.filterwarnings("ignore")

# for plotting:
import plotly.graph_objects as go

Secondly, we need to obtain historical Federal Open Market Committee (FOMC) statements, which can be found here. However, the new “FedTools” Python Library enables us to extract this information automatically:

# install library:
!pip install FedTools

# import relevant package and collect dataset:
from FedTools import MonetaryPolicyCommittee
dataset = MonetaryPolicyCommittee().find_statements()

Now, we have a Pandas DataFrame, with the ‘FOMC Statements’ in a column, indexed by FOMC Meeting date. The next step is to iterate through each statement and remove paragraph delimiters:

for i in range(len(dataset)):
  dataset.iloc[i,0] = dataset.iloc[i,0].replace('\\n','. ')
  dataset.iloc[i,0] = dataset.iloc[i,0].replace('\n',' ')
  dataset.iloc[i,0] = dataset.iloc[i,0].replace('\r',' ')
  dataset.iloc[i,0] = dataset.iloc[i,0].replace('\xa0',' ')

Now, we have to consider which dictionary of predetermined words we wish to use. For ease, we use Tim Loughran & Bill McDonald’s Sentiment Word Lists. As the list is extensive, it isn’t included within this article, but can be obtained from the consolidated code, held within the Github repo.

Next, we define a function which determines if a word is a ‘negator’. This function checks if the inputted word is held within the pre-determined list of ‘negate’.

def negator(word):
  '''
  Determine if the prior word is a negator.
  '''

  if word.lower() in negate:
    return True
  else:
    return False

Now, we can implement the BoW algorithm, considering potential negators in the three words prior to detected words. The function enables us to count the positive and negative words detected, whilst also saving these words within a separate DataFrame column.

def bag_of_words_using_negator(word_dictionary, article):
  '''
  Count the number of positive and negative words, whilst considering negation for positive words.
  Negation is considered as a negator preceeding a positive word by three words or less.
  '''

  # initialise word counts at 0:
  positive_word_count = 0
  negative_word_count = 0

  # initialise empty lists:
  positive_words = []
  negative_words = []

  # find all words:
  input_words = re.findall(r'\b([a-zA-Z]+n\'t|[a-zA-Z]+\'s|[a-zA-Z]+)\b', article.lower())

  # determine word_count:
  word_count = len(input_words)

  # for each word in the article:
  for i in range(0, word_count):
    # determine if negative. if so, incease negative count and append:
    if input_words[i] in word_dictionary['Negative']:
      negative_word_count += 1
      negative_words.append(input_words[i])
    # if the input word is positive, check the three prior words for negators:
    if input_words[i] in word_dictionary['Positive']:
      # if a negator exists 3 or less words prior, assign negative, otherwise positive:
      if i >= 3:
                if negator(input_words[i - 1]) or negator(input_words[i - 2]) or negator(input_words[i - 3]):
                    negative_word_count += 1
                    negative_words.append(input_words[i] + ' (with negation)')
                else:
                    positive_word_count += 1
                    positive_words.append(input_words[i])
      # if a negator exists 2 or less words prior, assign negative, otherwise positive:           
      elif i == 2:
                if negator(input_words[i - 1]) or negator(input_words[i - 2]):
                    negative_word_count += 1
                    negative_words.append(input_words[i] + ' (with negation)')
                else:
                    positive_word_count += 1
                    positive_words.append(input_words[i])
      # if a negator exists 1 word prior, assign negative, otherwise positive:
      elif i == 1:
                if negator(input_words[i - 1]):
                    negative_word_count += 1
                    negative_words.append(input_words[i] + ' (with negation)')
                else:
                    positive_word_count += 1
                    positive_words.append(input_words[i])
      # otherwise assign positive:
      elif i == 0:
                positive_word_count += 1
                positive_words.append(input_words[i])

  # collect the findings as a list:
  results = [word_count, positive_word_count, negative_word_count, positive_words, negative_words]
  
  return results

The build_dataset function iteratively invokes the bag_of_words_using_negator function for each FOMC statement, using an input argument of the Loghran & McDonald dictionary lmdict.

def build_dataset(dataset):
  '''
  This function constructs the dataset by interatively calling the bag_of_words_using_negator 
  function, using the input argument of 'lmdict'.
  '''

  # call the bag_of_words_using_negator function for each policy statement, place results into DataFrame:
  temporary = [bag_of_words_using_negator(lmdict,x) for x in dataset['FOMC_Statements']]
  temporary = pd.DataFrame(temporary)

  # Transpose the various columns to the initial 'dataset' DataFrame:
  dataset['Total Word Count'] = temporary.iloc[:,0].values
  dataset['Number of Positive Words'] = temporary.iloc[:,1].values
  dataset['Number of Negative Words'] = temporary.iloc[:,2].values
  dataset['Positive Words'] = temporary.iloc[:,3].values
  dataset['Negative Words'] = temporary.iloc[:,4].values

  # Calculate additional useful metrics:
  dataset['Net Sentiment'] = (dataset['Number of Positive Words'] - dataset['Number of Negative Words'])
  dataset['2 Year Sentiment MA'] = dataset['Net Sentiment'].rolling(window=16).mean()
  dataset['Sentiment Change'] = (dataset['Net Sentiment'].shift(1) / dataset['Net Sentiment'])
  dataset['Wordcount Normalized Net Sentiment'] = (dataset['Net Sentiment'] / dataset['Total Word Count'])

  return dataset

Finally, the plot_figure function invokes the build_dataset function, subsequently building out an interactive visualisation of the outputs.

def plot_figure():
  '''
  This function constructs a Plotly chart by calling the build_dataset function
  and subsequently plotting the relevant data. 
  '''

  # call the build_dataset function, using the input argument of the pre-defined dataset:
  data = build_dataset(dataset)

  # initialise figure:
  fig = go.Figure()

  # add figure traces:
  fig.add_trace(go.Scatter(x = data.index, y = data['Total Word Count'],
                           mode = 'lines',
                           name = 'Total Word Count',
                           connectgaps=True))
  
  fig.add_trace(go.Scatter(x = data.index, y = data['Number of Positive Words'],
                           mode = 'lines',
                           name = 'Number of Positive Words',
                           connectgaps=True))
  
  fig.add_trace(go.Scatter(x = data.index, y = data['Number of Negative Words'],
                           mode = 'lines',
                           name = 'Number of Negative Words',
                           connectgaps=True))
  
  fig.add_trace(go.Scatter(x = data.index, y = data['Net Sentiment'],
                           mode = 'lines',
                           name = 'Net Sentiment',
                           connectgaps=True))
  
  fig.add_trace(go.Scatter(x = data.index, y = data['2 Year Sentiment MA'],
                           mode = 'lines',
                           name = '2 Year Sentiment MA',
                           connectgaps=True))
  
  fig.add_trace(go.Scatter(x = data.index, y = data['Sentiment Change'],
                           mode = 'lines',
                           name = 'Sentiment Change',
                           connectgaps=True))
  
  fig.add_trace(go.Scatter(x = data.index, y = data['Wordcount Normalized Net Sentiment'],
                           mode = 'lines',
                           name = 'Wordcount Normalized Net Sentiment',
                           connectgaps=True))

  # add a rangeslider and buttons:
  fig.update_xaxes(
    rangeslider_visible=True,
    rangeselector=dict(
        buttons=list([
            dict(count=1, label="YTD", step="year", stepmode="todate"),
            dict(count=5, label="5 Years", step="year", stepmode="backward"),
            dict(count=10, label="10 Years", step="year", stepmode="backward"),
            dict(count=15, label="15 Years", step="year", stepmode="backward"),
            dict(label="All", step="all")
        ]))) 

  # add a chart title and axis title:
  fig.update_layout(
    title="Federal Reserve Bag of Words",
    xaxis_title="Date",
    yaxis_title="",
    font=dict(
        family="Arial",
        size=11,
        color="#7f7f7f"
    ))
  
  # add toggle buttons for data display:
  fig.update_layout(
      updatemenus=[
          dict(
            buttons=list([
                  dict(
                    label = 'All',
                    method = 'update',
                    args = [{'visible': [True, True, True, True, True, True, True]}]
                  ),

                  dict(
                    label = 'Word Count',
                    method = 'update',
                    args = [{'visible': [True, False, False, False, False, False, False]}]
                  ),

                  dict(
                    label = 'Positive Words',
                    method = 'update',
                    args = [{'visible': [False, True, False, False, False, False, False,]}]
                  ),

                  dict(
                    label = 'Negative Words',
                    method = 'update',
                    args = [{'visible': [False, False, True, False, False, False, False,]}]
                  ),

                  dict(
                    label = 'Net Sentiment',
                    method = 'update',
                    args = [{'visible': [False, False, False, True, False, False, False,]}]
                  ),

                  dict(
                    label = '2 Year Sentiment MA',
                    method = 'update',
                    args = [{'visible': [False, False, False, False, True, False, False,]}]
                  ),

                  dict(
                    label = 'Sentiment Change',
                    method = 'update',
                    args = [{'visible': [False, False, False, False, False, True, False,]}]
                  ),

                  dict(
                    label = 'Wordcount Normalized Net Sentiment',
                    method = 'update',
                    args = [{'visible': [False, False, False, False, False, False, True]}]
                  ),
              ]),
              direction="down",
              pad={"r": 10, "t": 10},
              showactive=True,
              x=1.0,
              xanchor="right",
              y=1.2,
              yanchor="top"
          ),])

  return fig.show()

Call the plot_figure function, and the figure is displayed:

plot_figure()