Big Data, Google, Facebook – data privacy still en vogue?

4. November 2014 Posted by Andre Kres

imageTechnologies for business analystics like data mining, big data and real time data look on the end of the day like a mind reading exercise for the customer. Knowing key customers well enough is the magic, which creates market leaders!

On the top level is it not too bad for the customer itself. The shop, which has the product on shelf the customer likes at a reasonable price he / she is prepared to afford, will be the shop of their choice very soon. Once the data are on premise of the company, what happens to it next?

I would like to introduce multiple aspects of privacy in connection with data and information.


Let us construct something. Imagine your son wants to celebrate his birthday in school by sharing some jelly babies with his friends. This clever boy realized already, that some of his friends have a Muslim religion and are not allowed to eat jelly babies made of pork jelly. By his social competence he realizes, that he can't share jelly babies in class without sharing with his Muslim friends. As a cosmopolitan parent you would look online for beef jelly babies. The keyword you enter is „halal“. Assuming you are not Muslim you need to look over a lot search results for a long time. Okay, now let us go one step further:
You wife works for US Aid and travels regularly to countries like Afghanistan...
In addition to the search requests for halal food comes a round trip airline ticket to Kabul, a couple of phone or skype calls and of course lot of internet research about Afghanistan.

You can imagine how this data may be interpreted, if it’s taken out of the context.


Edward Snowden opened the debate. People start thinking about how they are using the internet; thinking about their data available there. Facebook loses members. Deutsche Telekom is accused of giving the NSA access to their networks. Is the internet still save to use? Or has it ever been save? Can I still trust companies as oldest data privacy institution, since Deutsche Telekom got under suspicion of leaking out data or allow total control to the NSA?

I personally believe data privacy is more important than ever. In order to protect their business, companies need to be recognized as a trusted harbor for data!


There is an argument that the police needs access to the data in order to prevent crime. I believe this is wrong. The police needs information - not data.  For instance if they look for a stolen green car, they don't need to know the color of the car a suspect drives. They just need to know, whether it is green or not. Of course knowing the context of the question the suspects answer would most likely be all the time „no“.

This leads to the next topic – trust. How can the policeman trust in in the suspects answer? He normally wouldn't. The policeman would track down the suspect and simple check it with his own eyes if possible – this is somehow audit ability.

The problem is accidentally gained knowledge. Taking the example of the policeman with the car and the suspect. Inspecting the car he might find out that the passenger in the car is another man with his wife. This information will most likely have consequences to his marriage, but in fact is not related to his job and he shouldn't have this information. It is not in the interest of the car passenger. The privacy of innocent citizen is heavily impacted and damage to personal wealth and safety need to be feared.

What does it mean to companies and the way they deal with customer data?

Companies need to move from data to information management. Not every application needs to have data access. Most of the consuming business intelligence application need information generated by data. Although the data might be falling under the data privacy act the information might not.



Coming back to the shop example from the beginning. In order to have the products the customer wants to buy in the shop the customer personal shopping habits need to be the input of the system.  As the shop has many customers the single customer shopping habits will be collected along with many others. The order placement system of the shop itself will order at the distributor based on information generated from all shopping habits accessible. Single personal data are not necessary anymore and should be deleted – at least it should not be possible to conclude on a single person.

The concepts of information, trust and audit ability are universal. Not only customer at all levels are discouraged with the new world - also business operations function inside the own company.


Have you ever wondered, why your nice and shining standard reporting are not used? The official statement range from „they are wrong“, „they do not fit my need“, to „my business area is different or not covered“. At the end of the day this is again a matter of trust and audit ability.

 Aren’t this business analyst inside the business operations functions like the policeman with car? For me it looks like they want collect their own information out of the data available as they do not trust the standard report and can not audit it. On the other hand they also may learn things, they don't need to know.


Taking all this argument together, I would conclude „data privacy“ is more en vouge than ever. The initial trust in main companies based on historic believes - is on risk. Companies need to invest in information management technology and avoid giving access to data. The originator of the information – the customer – need to be able to trust, that the data are not in inappropriately shared. This trust can only be achieved by an end to end trace ability for the data called data linage. This data linage data need to be audit able - in the preferred case by the customer itself.

If data linage information can not made available to end customer then trust need to come of external auditing companies.


Talking about Big Data also means talking about data privacy. There are plenty of information in the internet, which are not protected by any authentication mechanism and therefore public. This also includes many facebook  / google profiles, internet foren and other privately authored content. Although in the media big data and analytics is presented as the bad guy giving access to the most private secrets - there is technology available to make it “safe for the public”. In my upcoming blog posts I’m going to write more detailed about all the different aspects of data privacy I just broached and introduce technical solutions to handle the challenges coming with information need and privacy.