<div dir="ltr">Hey Jaskaran,<br><div class="gmail_extra"><br><div class="gmail_quote">On Thu, Mar 9, 2017 at 10:10 AM, Jaskaran Singh <span dir="ltr"><<a href="mailto:jvsg1303@gmail.com" target="_blank">jvsg1303@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi,<br>
<br>
Currently we collect user stats when someone downloads LO from our<br>
website. Now these may not be very useful since only very limited<br>
information is obtained by this method. Also, not everyone gets to<br>
participate in this because not everyone downloads LO. Some just get it<br>
preinstalled on their O.S while others get a copy through their friends.<br>
<br>
I believe it's important for us to know about our users as deeply as<br>
possible so as to make informed choices. The information which we should<br>
be looking for is:<br>
<br>
1. Operating System, word size and kernel version<br>
2. RAM and Cache amount<br>
3. CPU and GPU specs<br>
4. Opencl driver<br>
5. Display specs<br>
6. Country<br>
7. Default Language<br>
8. <anything_else?><br>
<br>
Now, obviously this is sensitive information and most users would<br>
disagree to share it. So we could introduce a way to anonymously share<br>
this data. We could enable client to use a proxy to share this OR enable<br>
this data to be sent over Tor (Onion Router). But again, most users<br>
wouldn't want that.<br>
<br>
So I've found another way of doing this. Have a look at Rappor[1]. It<br>
introduces some random noise so that we are never sure of the data that<br>
client sends us. The statistics that we would get would be in terms of<br>
probability. For example, if a system has i3 processor, it will roll a<br>
dice to determine whether it should speak the truth or not. And by<br>
default we could have 80% (?) chance of speaking the truth. So if we get<br>
the data that user is running i3 processor, we are 80% sure that he/she<br>
is. And 20% chance that he/she is reporting wrong info. So aggregate<br>
that for a large number of users and we would get a rough trend.<br>
<br>
We could also share this data in the forms of numbers and graphs(and<br>
other representations) on our website.<br>
<br>
So this would work this way. Whenever someone installs or upgrades LO<br>
and starts LO for the first time, a dialog box appears asking for<br>
permission to share some data while also explaining how this would not<br>
compromise their privacy.<br>
<br>
I'd like to know your views on this. And I'd like to implement this if<br>
none of you want to. I may apply for this as a project in GSoC. So<br>
please inform me if you can be a mentor for this project.<br>
<br></blockquote></div><br><br></div><div class="gmail_extra">So basically this requires an opt-in scheme instead of the opt-out that you have in mind. Users are very sensitive when it comes to collecting information that are perceived as personal. Based on that I think the value might not be as big as you hoped. Currently the plan is to collect info about the number of active users as part of the automatic update but not much more. Similar to Tor I'm not so sure if I see the value in having a huge collection of statistics that we are not planning to use. Besides the obviously problem of privacy the bigger your data set the more work you need to invest in processing the data.<br><br></div><div class="gmail_extra">Based on that it would help if you would provide some cases where having such detailed statistics would help us improve LibreOffice.<br><br></div><div class="gmail_extra">Regards,<br></div><div class="gmail_extra">Markus<br></div></div>