Collecting User Statistics

Jaskaran Singh jvsg1303 at gmail.com
Tue Mar 14 15:17:23 UTC 2017


Hi moggi,

Originally, my vision was to use the data collected for the help of our
marketing team. They would get to know the category of people they
should aim for. They could do all kinds of data mining stuff to get
what's useful for them.

l10n team could also use this data to see where LO is gaining popularity
and where they should focus their efforts on. The dev team could also
make use of it. But I can't think of any at the moment.

And yes, users could be very conscious of the data they share with us.
Maybe, a polite dialog box explaining them how their privacy isn't
compromised can work. But, quite a few of them would still opt out.

On Tuesday 14 March 2017 08:21 PM, Markus Mohrhard wrote:
> Hey Jaskaran,
> 
> On Thu, Mar 9, 2017 at 10:10 AM, Jaskaran Singh <jvsg1303 at gmail.com
> <mailto:jvsg1303 at gmail.com>> wrote:
> 
>     Hi,
> 
>     Currently we collect user stats when someone downloads LO from our
>     website. Now these may not be very useful since only very limited
>     information is obtained by this method. Also, not everyone gets to
>     participate in this because not everyone downloads LO. Some just get it
>     preinstalled on their O.S while others get a copy through their friends.
> 
>     I believe it's important for us to know about our users as deeply as
>     possible so as to make informed choices. The information which we should
>     be looking for is:
> 
>     1. Operating System, word size and kernel version
>     2. RAM and Cache amount
>     3. CPU and GPU specs
>     4. Opencl driver
>     5. Display specs
>     6. Country
>     7. Default Language
>     8. <anything_else?>
> 
>     Now, obviously this is sensitive information and most users would
>     disagree to share it. So we could introduce a way to anonymously share
>     this data. We could enable client to use a proxy to share this OR enable
>     this data to be sent over Tor (Onion Router). But again, most users
>     wouldn't want that.
> 
>     So I've found another way of doing this. Have a look at Rappor[1]. It
>     introduces some random noise so that we are never sure of the data that
>     client sends us. The statistics that we would get would be in terms of
>     probability. For example, if a system has i3 processor, it will roll a
>     dice to determine whether it should speak the truth or not. And by
>     default we could have 80% (?) chance of speaking the truth. So if we get
>     the data that user is running i3 processor, we are 80% sure that he/she
>     is. And 20% chance that he/she is reporting wrong info. So aggregate
>     that for a large number of users and we would get a rough trend.
> 
>     We could also share this data in the forms of numbers and graphs(and
>     other representations) on our website.
> 
>     So this would work this way. Whenever someone installs or upgrades LO
>     and starts LO for the first time, a dialog box appears asking for
>     permission to share some data while also explaining how this would not
>     compromise their privacy.
> 
>     I'd like to know your views on this. And I'd like to implement this if
>     none of you want to. I may apply for this as a project in GSoC. So
>     please inform me if you can be a mentor for this project.
> 
> 
> 
> So basically this requires an opt-in scheme instead of the opt-out that
> you have in mind. Users are very sensitive when it comes to collecting
> information that are perceived as personal. Based on that I think the
> value might not be as big as you hoped. Currently the plan is to collect
> info about the number of active users as part of the automatic update
> but not much more. Similar to Tor I'm not so sure if I see the value in
> having a huge collection of statistics that we are not planning to use.
> Besides the obviously problem of privacy the bigger your data set the
> more work you need to invest in processing the data.
> 
> Based on that it would help if you would provide some cases where having
> such detailed statistics would help us improve LibreOffice.
> 
> Regards,
> Markus

Regards,
Jaskaran



More information about the LibreOffice mailing list