Press Release

Retrieval Augmented Generation for New Orleans City Council Transparency

By
Eye on Surveillance
Retrieval Augmented Generation for New Orleans City Council Transparency

Sawt is an open source AI assistant trained with recordings from New Orleans City Council meetings. Today, any resident can ask Sawt a question about what’s going on in the council. Our long-term aim is to develop an ethical, community-controlled large language model (LLM).

If you live in New Orleans, consider attending our annual meeting. You can RSVP here. If you don’t live in New Orleans, we accept crypto donations. Our OpenAI bill is starting to grow and any help would be greatly appreciated.

Figure 1: Screenshot of Sawt response

Results so far

Although we've just publicly launched the beta version of Sawt, a select group of local residents have been testing it since the summer. To date, 319 questions have been asked by New Orleanians. They have also provided feedback on 201 responses generated by the system.

Retention

Despite targeted outreach, we've noticed that community members are not returning to use the tool after their initial interaction. There has been minimal engagement apart from a spike during a mid-November focus group. This indicates that Sawt is not helpful enough yet for people to want to come back.

Figure 2: Questions asked on Sawt, by week

We've approached Sawt's development with caution, understanding that technology alone isn't a catalyst for change. Genuine grassroots engagement is paramount. While we remain critical of the project, we're also excited about its potential. Upcoming features aimed at increasing Sawt's usefulness include instant data uploads for new council meetings, alerts when registered keywords are discussed during meetings, a larger dataset, and more precise citations.

Accuracy & bias

Improving accuracy and reducing bias are critical to making Sawt a valuable tool. Through focus groups with organizers, non-profit workers, and community members from diverse areas, we've confirmed that Sawt's responses are not always accurate and sometimes contain biases. When asked to give feedback on a scale of 1 - 5, people currently rate Sawt around a 3.

Figure 3: Feedback on Sawt across two different releases of the tool (v0.0.1 and v0.0.2)

These feedback sessions also surfaced some counterintuitive patterns about real world usage. For example, people feel that generated responses are the most accurate when the number of source documents (k) is lowest. In other words, people find that responses are most accurate when there is less information being considered. 

Figure 4: Detailed feedback on Sawt. Figure 4 contains the same results as Figure 3, but with responses normalized per respondent. Figure 4 shows how people scored responses by accuracy, helpfulness, balance, and overall. The k parameter corresponds to the number of documents used to inform the generated response. k=5 means that five video clips, meeting minutes, and/or articles were used to inform the response. k=15 means that fifteen such documents were used.

Ultimately, the sample size is small, so there’s limited statistical significance to these results. If you are a New Orleans resident, you can help by submitting some feedback yourself.

How it’s made

The initial Sawt prototype was developed over a weekend by @ayyubibrahimi. The core logic has remained consistent since then. We first ingest raw data, create embeddings, and setup FAISS. When someone asks Sawt a question, we identify the most relevant documents using FAISS, combine them with the query, and send them to OpenAI for a response.

Figure 5: How Sawt works

In the preprocessing/FAISS phase, we implemented the Hypothetical Document Embeddings (HyDE) methodology from 'Precise Zero-Shot Dense Retrieval without Relevance Labels to create our embedding space for Retrieval-Augmented Generation. RAG is a technique that combines the strengths of large language models with external data retrieval, aiming to enhance the relevance and accuracy of the responses generated by the AI model. The hypothetical document for generating these embeddings was produced using a zero-shot prompt. This prompt reads:

As an AI assistant, utilize the New Orleans City Council transcript data from your training to deliver a detailed and impartial response to the following query: "{user_query}".

This approach ensures that the model uses the specific dataset it was trained on, aiming to provide a comprehensive and impartial answer to the question.

The prompt sent to OpenAI during the final phase is more complex and detailed. It contains several instructions emphasizing the need to format responses clearly. It's designed to balance the breadth of response with conciseness. A significant part of the prompt is dedicated to explicitly investigating and addressing potential biases in the response. Additionally, the prompt instructs the model to define uncommon words, making the responses more accessible and informative for people who may not be familiar with specific terms related to city council activities.

Figure 5: Prompt used for queries

Although Sawt is an Eye on Surveillance project, we worked with Dr Culotta and three undergraduate students at Tulane University. Their work has focused on addressing bias and they built out the current feedback mechanism. Dr Culotta also generated the analysis used in Figure 4.

Understanding our context

In December 2020, after the New Orleans Police Department (NOPD) had been caught lying for years about its use of facial recognition, Eye on Surveillance a coalition of local organizations united to divest from surveillance and invest in communities, pushed the New Orleans City Council to adopt a data privacy ordinance which included a ban on facial recognition and other surveillance technologies. However, merely a year later, the city repealed the ban on facial recognition and cell-site simulators. Moreover, the council member who sponsored the ordinance is now piloting a predictive policing tech program as district attorney, which is banned under the ordinance he passed. 

Most members of Eye on Surveillance are volunteers with day jobs. It’s hard to keep up with all the regressive steps the council is taking during their meetings. While official council meetings are public and theoretically accessible, they often occur during work hours, last all day, and have agendas that change last-minute. This makes it challenging for us to keep up. Especially when the city strategically schedules announcements of controversial decisions/meetings the day before a holiday. 

Throughout the year, there have been numerous instances where we only became informed about significant surveillance initiatives months after they were raised in city council. We hope Sawt will be a useful tool for us and other New Orleanians moving forward.

Figure 6: New Orleans took on surveillance throughout 2023 without transparency

  • ShotSpotter: Council considered using it in January and we found out in February.
  • Facial Recognition: Reporting was available as of February, but the city rejected several public records requests until July. Sawt itself was helpful in providing information that led to our public records request finally being accepted.
  • Predictive Policing: Former council member who sponsored ban on predictive policing, is now DA and piloting the illegal technology. We found out from the newspaper.
  • Drones: NOPD seeks feedback on drones they already purchased.

Next steps

As we move into 2024, we're reflecting on the broader implications of Sawt. We're questioning the extent to which we might be perpetuating AI biases and considering the specific biases of Eye on Surveillance members that could hinder our efforts. Nevertheless, we're excited about the continued progress and the prospect of launching an official version in the summer of 2024. We're committed to creating community-owned and operated AI models. 

If you want to get involved:

  • RSVP to attend our annual meeting
  • Sign up for our mailing list
  • Contribute some code
  • Donate
  •      BTC: bc1qkseneu5cv9g6u4gpmnlen3q3at59r6sj6kn07q
  •      SOL: CawuhzDxyytazxywF942VsLwi4RKEWqryLYDsv4hndNa
  •      ETH: 0xAA37b8a54e49e6c61De9904985e2887dfEABBA20