INTERVIEW. With digital privacy becoming one of the most discussed subjects in the past years, the market for private search engines is constantly growing. Yet not all private search engines are the same. Meet findx.com – the Danish crawler-based search engine that does not have users – but people who privately surf the web using their product. I got the chance to talk with one of the guys behind the project about their philosophy and business model, relationship with GDPR and future prospects for the company behind it. Curious? Read on.
There are two main types that stand out in the current market for search engines: crawler-based search engines and meta search engines. A crawler based search engine finds a webpage, downloads it, analyses it and then adds it to its database. Therefore, a crawler-based search engine builds its own database of webpages. On the other hand, a meta search engine takes the results from other search engines results and combines them into one large listing.
Findx.com is part of the first category, creating its own search engine by scanning the web. The search engine is the project of Privacore, a Danish company established in 2015 with the mission to enable people to protect their privacy online. Besides findx, Privacore also offers Privafox, a full privacy – enhanced browser to be released during this summer. A couple of weeks ago I traveled to Privacore’s headquarters in Holbæk to meet Brian Schildt Laursen – Privacore’s Chief Relationship Officer. Here’s what came out of our chat.
What is the difference between findx and other crawler-based search engines?
Brian: The way we approached the search engine market is that we wanted to be independent. Build out a true search engine and our own index, rather than be another meta search engine. This is what actually differentiates us from the rest – independence. On the technical side we are independent by having our own index, and on the business side we are independent by being privately funded and having no investors to influence the decisions we make.
At the time we started, there weren’t any other open source privacy focused crawler-based search engines that could scale. To scale you’d have to have a lot of computer power. If you want to build a search engine that is big enough, fast enough and have enough results in the index, then you need the software, computer power and terrabytes to be scalable. We started out small but now our setup for scaling is a lot better than at the beginning. This is another difference when compared to other open source private search engines. In all honesty it took a way bigger effort than we initially imagined to get our independent search engine to work… but now we are ready.
You have a community-driven approach in developing findx. What does this mean?
Brian: It is of great value to us to get people’s feedback, and one of the things we want to do is involve people in improving the results. Since we collect a lot of info about websites, we automatically try to determine how many websites are spam sites, if they are outdated or simply not relevant to a subject. In the end though, human feedback is really important to get better results. People can do ‘quality rating’ of a search result by giving a sad or smile face. We use that feedback and integrate it in our algorithm. If many people report a website as being a good search result, it can get a better rank in the results list. We are currently working on making it easier to report on results. A step in this is to be clearer on the feedback values we use. There is also an option of improving the title or description of a website on findx. On a longer term, the website owner might see the proposed review and decide if it fits better than the original description.
We built all our products as open source solutions. This goes for Privacore, our browser built on Firefox, the tracking controller for browser cookies and also findx, the search engine, built on Gigablast. If you want to and have the skills, you can look at the source code for findx, this meaning even the ranking algorithm is open sourced. We knew it would be tough to get contributors for findx, and as such having findx completely open source was not a goal. We wanted to open it more for transparency – reasons. But along the way there were some individuals who wanted to contribute. The difficulty comes as soon as they discover the huge levels of complexity behind such a system. Also because it is not a distributed solution, it becomes difficult for one person somewhere in the world to take the code, put it to on a server and run it. The capacity needed for running it at scale, involves more than just a home-computer. To some extent the nature of a search engine is centralized, even though an initiative like Yacy has created a more decentralized solution.
What about the two basic principles in your Privacy Promise: ‘Voluntarily shared information only’ and ‘Less is more’ ?
The less information we collect, the better. We set some boundaries for ourselves. Especially on the search engine where we don’t collect any information on individuals, such as IP addresses. By default we don’t put any cookies, and we can not see what you previously searched for. This makes every search a completely new one. The less information, the better.
Findx doesn’t have users. It has people using it to search the web in private.
An example is also our newsletter. When we send it out, we don’t collect information on how our subscribers react to it. Do they read the newsletter? What links do they click on? We don’t know. Do we need the info? It would be nice to have, but it’s not a need for us to send out the newsletter. It’s just a decision. This is a challenge for us sometimes. It limits our solutions a bit in a data driven economy, according to which we should collect all information from whoever uses findx. But our perspective is different. Findx doesn’t have users. It has people using it to search the web in private. It is the principle we have for all our products .
We differentiate though between the search engine, the marketing websites, the newsletter and the forum as a communication platform. In order to avoid spam on the forum, for example, we do collect some information such as e-mail and IP address. On findx itself we do not collect any information whatsoever. Unfortunately, we can not technically prove this, but you have to trust us. Other similar search engines such as DuckDuckGo have the same issue.
What data do you collect about people then?
Brian: None! We don’t collect information about people.
On the more technical side we use the IP address and browser language to initially make an assumption of your location, and set a default language for you, but the IP-address itself is thrown away and can not be correlated with your search. And that’s all.
Talking about security, we have done a lot to protect you. One example is to secure the connection with certificates. When you are on findx, your searches cannot get sniffed or intercepted. There is encryption between your browser and us. As long as you’re on findx, the traffic is encrypted. We have the highest rating on our SSL certificate.
What is your business model?
Brian: It is quite simple: advertisements based on the search term. For example, if you’re searching for „Tennis shoes“, we might show you an advertisement for Adidas or Nike shoes from an online store. Another example is if you search for „Audi“ we might show you an ad for an Audi related product. . If people click an ad or link to an affiliate, we’ll earn a few cents.
But it is important to stress that we don’t profile you, meaning we don’t show you advertisements based on both tennis shoes and cars. You would not be segmented in any way. Normally, if you look for “Audi”, “iPhone”, online marketing solutions will segment you in an high-income group. We don’t do that because we can not: we don’t collect all that information.
This context-based add would show you advertisement related to that one search you made, instead of all searches.
If you and me would look for the same word, would we get the same results?
Brian: Because results on findx are neutral, all people get the same results, so yes. Basically this means that if two users, one in Copenhagen and one in London both search for „Hotels in Berlin“ they will get the same list of hotels.
The advertisement would be different. Advertisement is made by a 3rd party and ads are shown, according to the 3rd party’s list of ads. Usually, there are 10 search results displayed out of which 1 is an ad. Maximum number of ads that can be displayed on findx is 3.
For example, with tennis. Both Adidas and Nike could have bought ads. For your search, Adidas ads might be displayed. When I search, Nike’s ads are next in line to be shown, so I will see Nike ads for tennis shoes. There is no behavioral tracking and profiling that we can sense, every search is private, anonymous. This is contextual advertising. Therefore, we cannot correlate results and ads with search history.
We are very different, in this respect, from Google that does profiling. They collect the data from Google search, from Youtube, from Google +, your Gmail. If you’re using Google Chrome, most likely your browsing history as well. All of it creates a profile of an individual. And that profile is used to serve ads, to sell ads to advertisers. We don’t have an advertising network, we don’t sell ads. We provide ads from a 3rd party. The difference though is that there is no segmentation, no analytics, no Google AdWords and when you use Privafox, no annoying ads based on your profile that follow you around the internet.
Privacy by Design (PbD) is embraced by findx and also constitutes one of the core principles of the emerging GDPR. What are the biggest challenges in implementing PbD ?
Brian: We have the fortune of doing Privacy by Design (PbD) from the beginning. We don’t have any solutions built that we have to implement PbD into. PbD is the core of our business. We adopted PbD after the Canadian model as guidelines to developing our business. For this reason, the new EU General Data Protection Regulation (GDPR) is easier for us. But we still have to document the processes that we have, how we handle personal data, not for findx of course, but eg. for our forum. Right now we are preparing for GDPR by working on „the right to be forgotten“. If we catch a result on any given website and someone requests to have it deleted, we have to have a process for this. GDPR imposes a 30 days period during which we have to be able to respond to such request. Looking at Google’s over 300.000 annual requests, we want to be prepared. Even if we only have to deal with 1% of Google’s numbers, that would be a huge task for us.
We believe that helping people control privacy online has a market in the future.
Two years ago, we didn’t look into legislation around online data. We simply decided that privacy is an important topic and private search engines are on the rise. That foundation just made it easier for us now in the light of GDPR. We believe that helping people control privacy online has a market in the future.
There are limitations in terms of convenience for people. We see findx as an alternative to Google search and we don’t want to compare with them. It is almost like a comparison between a bike and a bus. They both go from point A to point B, but in completely different ways and the convenience is really not comparable. People got used to getting the exact search result they were looking for. On findx, they will not necessarily get the same result as on Google. This is a challenge for us: getting people to understand the value of privacy over that of convenience.
What’s next for Findx?
Brian: At the moment we are collecting information for our index and we can tell the more websites we have, the better search results become. Relevance is the most crucial part for us right now, as the live AMA we held on reddit showed. We therefore work on making private results more relevant and more personal, while creating more transparency around these results. An early stage project we are developing is a setting on ranking factors. We want to give people the option to change these ranking factors themselves. Imagine you can decide how important it is that the language is relevant, or that the search word is found in the link text and so on. It gets a bit search engine nerdy, and we are still testing the relevance.
We are not judges.
The end point is to get relevant results while being transparent about how the results are ranked and to prove that the ranking algorithm isn’t a black box. It would be lovely if we could say: „Here are the default parameters we use to display the results. Change them as you wish!“. We are not judges. We don’t want to control a black boxed „information filter“ for people or limit the information they can find by hiding how search works. We index as much info as possible, but throw out obvious spam sites, linkfarms and the likes.
We are also looking at making the feedback more relevant, iterating on what people value when they think of results quality. After the summer we are also looking at releasing our findx browser extension for Safari, Firefox and Chrome. I also have to mention the findx mobile app, available in the near future as well.
Of course, there is also our browser, Privafox, coming out in beta during the late summer of 2017. It will have a built-in privacy controller and findx will be its default search engine.You can turn the controller on and off and see what scripts are blocked by the embedded adblockers. Another thing is the cookie controller where the people using Privafox can allow and block different cookies. All these aim at allowing a person to control their online privacy through the browser they’re using. On top of embedded helpful tools, all settings are set to the best privacy standards, according to Privacy – by – default principles. Even if we built our browser on Firefox, we disabled the automatic sending of information back to Mozilla. It is not because we don’t trust that team, but rather because it does not fit in the story of Privacore. We made a commitment, therefore we choose to let the individuals decide for themselves if they want to share their personal information.