Constructing powerful search queries in OSINT investigations


  • The use of Boolean operators and search filters are great techniques for narrowing down search results

  • But search results are only as good as the query itself

  • This blog post discusses various tips & tricks for constructing powerful search queries that can be applied to any issue


Boolean operators and search filters

Imagine you had to monitor the US 2020 election to identify any mis- and disinformation, or assess the current threat posed by far-right entities in Bavaria. As you can imagine, there’s quite a lot you’d have to think about in terms of how and where to start. For example, how can you find relevant information? What sources and keywords should you use? And many other important questions.

In this blog post, I want to focus on the data collection during such a task by presenting some useful tips and tricks that can help you construct powerful search queries. As I discussed in my last blog post, it is essential to follow a methodology; otherwise, you could get lost pretty quickly, and you might not produce any results.

If you’re new to the world of OSINT, I highly recommend the Google Guide to learn fundamental search techniques. Likewise, you should understand how search engines, such as Google, actually work because you need to be aware of the limitations when using these.

I also want to highlight that there’s not only Google but other search engines too. However, for demo purposes, I’m going to focus on Google in this blog post, even though some of the tips and tricks mentioned here can be applied elsewhere.

It is vital that you use multiple search engines because you are not searching the web with a Google query - you are actually searching Google’s index of the web

https://www.google.com/search/howsearchworks | The life span of a Google query is less then 1/2 second, and involves quite a few steps before you see the mos...

Constructing powerful search queries

When it comes to ‘Googling’, there are broadly speaking three types of searches:

  1. Basic

  2. Advanced

  3. Complex

Basic searches are those when no Boolean operators or any other search technique is used. In other words, it’s just typing words and hitting enter.

e.g. London weather today

The next level would be advanced. This is when you use Boolean operators, perhaps in combination with some other search techniques, but the number of these techniques is minimal.

e.g. “Nashir News” OR “Amaq News” site:facebook.com

The final level is so-called complex searches. These are searches where the full spectrum of advanced search techniques and Boolean operators are employed to construct long and complex search queries, which I’ll set out below.

Complex queries for powerful filtering

Before I go into more detail, let me first clarify when complex search techniques can be used. When you quickly want to check something, you probably won’t need a complex query; however, there are instances when you want to fully exhaust Google’s capabilities. Ongoing monitoring of mal-, mis- and disinformation in real-time could be an example. As you get bombarded with a ton of information, detecting relevant information becomes more challenging.

As witnessed during this global pandemic, traditional and social media sources have been used to spread both legitimate but also inaccurate, false, and misleading information. In light of this, the WHO described this current information ecosystem as an ‘infodemic’ - a linguistic blend of the words information and epidemic to highlight the rapid and far-reaching spread of true and false information.

So, how can you possibly identify any relevant leads in a flood of information? Here’s a great training video by First Draft News that explains in less than one minute how you can construct such powerful queries. In essence, it’s a combination of highly relevant terms, Boolean operators, and parentheses to group these - a highly efficient technique. If you’re a journalist or interested in this field in general, I highly recommend checking out First Draft News Training and Research Resources.

How to construct complex queries

Now let’s have a look at how to construct complex search queries. To be clear, this should not be viewed as the perfect or only way of doing it but more as a helpful guide that enables you to create complex queries whilst focusing on your requirements, thus supporting your data collection efforts.

Use relevant keywords - whatever the issue may be, you want to use keywords that appear on the website, in a social media post, or anywhere else. Spending time on researching what keywords you should use will benefit your data collection. A quick way of finding relevant keywords on a topic is Google Trends. It allows you to quickly explore where Google users worldwide have searched from and which keyword(s) were used the most, amongst other things.

Here’s a quick example regarding claims around ‘voter fraud’ in the 2020 US presidential election (Fig 1). We can quickly identify related topics and queries from the past 30 days that can help us understand the context and identify relevant topics and keywords which we can explore further.

Fig 1. Screenshot - Google Trends voter fraud, past 30 days

I also recommend spending a lot of time on the platforms you're investigating. Analyse conversations and focus on the language used to find keywords, trends, and other relevant information.

Use the right language and tone - in 2016, Quiztime founder, Julia Bayer, published an article on how to find breaking news on Twitter, particularly events relating to human and natural disasters (Fig 2). Her methodology is based on her experience, having seen numerous eyewitness accounts on Twitter during such events. She writes:

To find breaking news on Twitter you have to think like a person who's experiencing something out of the ordinary. Eyewitnesses tend to share what they see.

Fig 2. Screenshot - examples of suggested keywords by Julia Bayer

The same principles can be applied for data collection in general. Taking some time to put ourselves in the shoes of our target can significantly improve the quality of the keywords and aid our collection efforts. When it comes to violent extremist and terrorist activity online, for example, I’ve become familiar over the years with terms and symbols used by Salafi-jihadists and right-wing extremists. This has undoubtedly benefited my work and up to the present day, this knowledge helps me to identify new accounts and platforms used by terrorists and violent extremists online.

Keep track of your keywords - search results are only as good as the query itself. Keeping track of keywords and various combinations as well as a note on performance is not only good practice but super helpful. I like to use Google Sheets because they’re easy to use and offer useful things for OSINT (but more on this in another blog post).

When tracking your keywords, I also recommend creating different categories. This becomes especially useful when you collect data on an issue that can be broken down into multiple components. For example, if you’re looking at current far-right activity in the state of Bavaria, you could break that down into multiple search categories. Category 1 could list common far-right terminology; category 2 could list location related keywords, e.g. cities and towns in Bavaria; category 3 could list known entities both on- and offline; and category 4 could list action-related keywords, such as ‘donate’, ‘support’, ‘join’, etc.

The idea is to have these isolated categories of keywords laid out in front of you, so you can concentrate on generating complex search queries by combining Boolean operators, parantheses, and perhaps additional search techniques (Fig 3).

Fig 3. Screenshot - example of how to keep track of keywords

Generating keywords is an iterative process - I view this process as an iterative cycle, in which I generate keywords, test and evaluate these, and depending on the results, add new or remove bad ones. So make sure to revise your keywords and adapt accordingly.

Use Tweetdeck to monitor multiple combinations in real-time - Tweetdeck doesn’t need an introduction. If you’re new to it, I highly recommend Bellingcat’s comprehensive guide on how to use it by Charlotte Godart. As I mentioned in my previous blog post, Twitter, or social media in general, constitutes a rich source of information for many issues worldwide. So make use of it, and simply paste your complex queries in your dashboard. Here’s an example on voter fraud related issues in the 2020 US presidential election (Fig 4).

Fig 4. Screenshot - example of complex queries to monitor voter fraud related issues on Twitter

Concluding thoughts

Depending on your OSINT tasks, you probably don’t need to use complex queries all the time; but when you do, make sure to keep track of them and try different combinations by utilising parentheses, Boolean operators, and other search techniques. This will help you to maximise your data collection and find relevant information in a structured way.


If you have any other suggestions or recommendations, please share them with me on Twitter (@LorandBodo) or comment below!

Thanks for reading and see you next time!


Next
Next

Social Media Search Strategies