Catch up on stories from the past week (and beyond) at the Slashdot story archive

 



Forgot your password?
typodupeerror
×
Google Privacy The Internet

Ask Slashdot: What Does Your Data Mean To Google? (google.com) 88

shanen writes: Due to the recent kerfuffles, I decided to try again to see what Google had on me. This time I succeeded and failed, in contrast to the previous pure failures. Yes, I did find Google's takeout website and downloaded all of "my data," but no, it means nothing to me. Here are a few sub-questions I couldn't answer:

1. Much more data than I ever created, so where did the rest come from?
2. How does the data relate to the characteristic vector that Google uses to characterize me?
3. What tools do Googlers use to make sense of the data?

Lots more questions, but those are the ones that are most bugging me right now. Question 2. is probably heaviest among them, since I've read that the vector has 700 dimensions... So do you have any answers? Or better questions? Or your own takeout experiences to share? Oh yeah, one more thing. Based on my own troubled experience with the download process, it is clear that Google doesn't really want us to download the so-called "our own" data. My Question 4. is now: "What is Google hiding about me from me?"

This discussion has been archived. No new comments can be posted.

Ask Slashdot: What Does Your Data Mean To Google?

Comments Filter:
  • Comment removed based on user account deletion
    • Re:Et tu , Btute? (Score:5, Informative)

      by PolygamousRanchKid ( 1290638 ) on Wednesday April 04, 2018 @06:07PM (#56383911)

      Does Google sell it outright?

      The German postal service, Deutsche Post, was just caught selling data to political parties, which was used in election campaign targeting.

      Deutsche Post responded with the claim that they were not selling the data . . . merely "renting it out" . . .

      Mega giga lame.

    • Data brokers (Score:1, Informative)

      by Anonymous Coward

      Google doesn't sell it outright. They are aggregating from data brokers and other sources.

      You can cut into the data broker model by subscribing to a service like DeleteMe, but it's expensive and not a silver bullet by any means. But doing that + using a privacy-friendly e-mail provider + using a secure messenger + securing your browser with ad/tracker blockers + seriously limiting what you put on social media + using DuckDuckGo or Startpage for search + using a VPN...

      If you do ALL that you'll have pretty st

    • Does Google sell it outright?

      As far as I understand it, Google sells access to your data in the form of targeted ads, not your data itself, because it's so incredibly valuable. And that access is more in the form of "I want to show ads to this demographic", so probably nothing that could personally identify anyone. In some ways, I suppose that's lucky for us, because they have a very big financial incentive to guard against leaking it.

      Then again, Facebook let all their data escape, so...

      • by shanen ( 462549 )

        But because the google doesn't share the usage information or any of the profits with us, then we have no incentive to provide accurate data to the google. Even more seriously, if the data contains flaws and errors that reduce the value of the data when the google is trying to sell it, we can't correct those problems.

        Your topic is actually related to the extended questions I added a few minutes ago, especially the last two.

        • Even more seriously, if the data contains flaws and errors that reduce the value of the data when the google is trying to sell it, we can't correct those problems.

          Is that actually a problem for us, rather than Google? I mean, what they general sell is targeted advertising. Why would you or I really care if their data is correct or not? I don't really understand that aspect of your question.

          The data that credit-reporting companies have on us impacts our daily lives about 1000x more than what Google collects about me, because they draw conclusions about that data (a credit score) that have very definite real-world effects on me in the form of loan rates or even cred

          • by shanen ( 462549 )

            Hmm... Again I think of it from the perspective of more freedom is better, which goes back to the ideas of competition. If the google had a real competitor and that competitor offered similar or better services, then the value of my data within that system does become a concern for me as well. If bad data lowers the value and income of the service provider, then they have to offer fewer services.

            However I admit that if the google were part of a competitive situation, I'd be shopping on a different tradeoff.

            • I certainly understand that sentiment, but I suppose I've had to force myself to make peace with some of the realities of my data being collected and used online, and the tradeoffs it entails, both positive and negative.

              For instance, Amazon is astoundingly convenient for me. Other than grocery shopping (and I could probably use them for that too, but I like my local QFC), I order almost everything I need online. The downside, obviously, is that a single online entity knows about ALL my shopping and readin

              • by shanen ( 462549 )

                Actually, I stopped using Amazon more than 15 years ago. I used it twice, and it was satisfactory both times, but the ongoing pursuit of additional sales caused me to reject further dealings with the company. I'm not sure if it was the greed or the insult or the threat. Greed should be obvious. Amazon wants more sales. The insult is assuming that I'm so similar to other people that they strongly expect me to buy the same things in the same patterns. Remember at that time they were only fishing with two samp

    • by shanen ( 462549 )

      If the google is selling it, I suspect they only sell aggregated forms. From my perspective as part of the product, I would actually like control over the degree of aggregation. I'm not too concerned if something about me is included as part of the average for all the google users within a state or even a large city, but I'd start getting concerned if they are selling parts of my data as parts of extremely small groups such as the people who live in my neighborhood or even the level of an apartment building

    • Google's panopticon means the Stasi know every detail of my life.

    • Does Google sell it outright?

      Does Coca-cola sell you its recipe?

      Your data is the only thing Google has to derive value. They sell *you* specifically access to you in a wide variety of ways through many APIs targetting many delivery mechanisms. But the data is what gives them the market leverage they have.

      To me that makes the submission all the more stupid. It's kind of like saying:
      "Coca-Cola prints the ingredients list on the side of the bottle, but it doesn't taste like coke when I mix it together, does anyone know what recipe they us

  • Google maps, Google Earth, keeping their word (My email account) and the rest they offer us.

    • Re:In exchange for (Score:4, Interesting)

      by shanen ( 462549 ) on Wednesday April 04, 2018 @05:54PM (#56383839) Homepage Journal

      Uh? What question are you trying to answer? And how does that question relate to any of the questions I posed? At first I thought you were trying to say something about derived data, but now I have no idea...

      However, one of the categories of data I was looking for was data about me from other sources. For example, in terms of marketing my data to the advertisers, such external data as my credit history would seem to be highly relevant. Perhaps I can find my credit report somewhere in there?

      In the original questions I left out one of the peculiarities I already discovered. A lot of "my" data that the google sent me was actually links to other places where I had posted things. In other cases the links seemed completely unrelated to me, as with a Google Play app to some game I don't believe I've ever downloaded or played.

      • Uh? What question are you trying to answer?

        Now your just trolling for Facebook.

        • by shanen ( 462549 )

          Sorry, Trax3001BBS, but I have to conclude that you are a terrible writer. Perhaps indifferent to communicating? If so, why write at all?

          I'm really trying to strain my imagination for some meaning in any of your comments. Perhaps your last comment is supposed to mean that you think I'm advocating on behalf of Facebook in some sense of its superiority to the google? If so, I would say that I basically have the same questions (and concerns) about the Facebook data, even though there was so much less of it. At

          • Sorry, Trax3001BBS, but I have to conclude that you are a terrible writer. Perhaps indifferent to communicating? If so, why write at all?

            Your right and apologize, I did indeed misread the summery.

            • by shanen ( 462549 )

              I guess I'm supposed to accept that apology and hope you are more careful in your reading in the future (even though I still don't understand the nature of your misunderstanding).

              However I remain more interested in prevention than cure, so I'm wondering how to map that exchange into EPR (Earned Public Reputation) terms. What dimension of your EPR ought to be diminished by that exchange? Perhaps some dimension of my own EPR should be lowered, too? Certainly not a productive branch of the discussion, and the

  • whatever they didn't get from you.
    • by shanen ( 462549 )

      Uh? Are you saying that they are hiding it by sending it to me? If so, then what I am seeking could be rephrased along those lines. Right now it looks like I have a gigantic pile of data that's even messier than my actual life, which is saying something.

  • by Rick Schumann ( 4662797 ) on Wednesday April 04, 2018 @05:54PM (#56383841) Journal
    Seriously, do you really think that with anything short of a court order or an order from Congress (or maybe a gun pointed at their heads) they're really going to show you how much actual data they have collected on you? When you signed up for their 'services' using your real name, you handed them the Keys to the Kingdom, regardless of any agreement (that you likely never read in the first place). The only way to win this game was to have not played in the first place.
    • Comment removed based on user account deletion
      • I never log in to Google services. I turn off as much tracking as possible using things like the pihole. I have my own email server and locally host much of the infrastructure that the normals rely on Google for. I use vpns running in the router (pfsense). I use fake names for most things. I use a dedicated GPS in my vehicles. I stopped using Google for everything a few years ago. There's virtually nothing that Google offers that you can't provide for yourself.

        • Re: (Score:3, Interesting)

          by shanen ( 462549 )

          Highly principled stand, and I congratulate you [HermMonster] for your energy and enthusiasm and even for your efforts, but I think you are deluding yourself. One reason is that by attempting to hide yourself you would actually be attracting attention to yourself. Quite possibly, you are even rendering yourself a marked man and the FBI is following you around trying to figure out what you are trying to hide.

          More seriously, some of the services cannot be used without leakage. Let me take an innocent example,

          • by rot16 ( 4603585 )
            Most of my e-mails come or go to @gmail.com, so running my own e-mail server is almost useless.
      • by Bert64 ( 520050 )

        Exactly this...
        When i first used the internet, it was commonly accepted that you don't give out your real information online... Now people post all kinds of information about themselves, you can usually build up a very detailed profile of someone just from what they make available publicly.

        Now, people think i'm weird for signing up to websites using aliases and not having a facebook page.

    • by Anonymous Coward

      They don't need your name. Your friends have you in their contacts under your clear name. That's more than enough.

  • by Visarga ( 1071662 ) on Wednesday April 04, 2018 @05:57PM (#56383855)
    The 700 dimensions vector (if it's true) is not something you can make sense of. It's an embedding vector that represents your characteristics in relation to all the other people. Each individual dimension doesn't have a meaning.
    • by shanen ( 462549 ) on Wednesday April 04, 2018 @06:04PM (#56383887) Homepage Journal

      I think I agree with you as far as you went, but in that case part of the information I am asking about is the context to interpret the shape of the categorization space and where I am within it. That is also in terms of the relationships to the parts of my data that contributed to my location and to the accuracy of that location. The google can reveal a lot about the space without exposing any of the individuals within it.

      Perhaps a more concrete example will help? For example, can the google look at the vectors of spouses to assess how well their marriages are liable to work? Just asking for a friend, since I'm pretty sure my wife would NOT let me look at her data. She'll barely tell me when breakfast is ready.

      • Re: (Score:2, Funny)

        by Lanthanide ( 4982283 )

        You've just, in this forum response, revealed enough information for anyone reading it to have a good idea of just how well your marriage is likely to work.

        • by shanen ( 462549 ) on Wednesday April 04, 2018 @06:43PM (#56384069) Homepage Journal

          Me thinks you [Lanthanide] are projecting, but I will confess that I never did understand how my own parents stayed together. My condolences to your much better half. Or perhaps better to respond with some variation of the old grading joke: "I was one of the students who made the dean's list possible!"

          That was just minor tit for even more minor tat. The most appropriate response would probably be to ask "Don't you have anything to say on any aspect of the actual topic at hand?" If you know nothing and have nothing to say, then you can always say nothing.

          I actually did consider raising the issue of using personality characterization for marriage guidance and counseling. I would not be at all surprised to find out that some branch of the google is exploring related business opportunities. However my own interests these days are probably much more mundane. I'm just trying to figure out who's treading on my freedom.

          By the way, I don't think the google is the worst abuser of our personal information. In a sense, the google's motives are pure insofar as they are focused on the money. Almost every question about what the google is doing with our information comes back to the answer "... because they think it will increase their profits."

          • When you give out "red flag" type of relationship data, (that your wife doesn't trust you with personal information, even though marriage is the most personal form of familial relationship) and then accuse people who notice of projecting, I just have to assume you should also stop gaslighting her.

            That's vector 86, if you're keeping score.

            • by shanen ( 462549 )

              That's sophistic BS. If there is any projection there, it's that I would respect her privacy as much as I would hope she respects mine.

              As matters stand now, you sound like a child who was probably in diapers when I was wandering though my first flame wars. I knew flame warriors who actually enjoyed themselves, but I've always regarded ad hominem argumentation as a waste of time, but apparently unavoidable when hominems are involved.

              I didn't introduce the gaslighting topic, and I would even argue that I made

              • You're not comprehending.

                I can't even tell if you're a non-native speaker who doesn't understand the word "projection" in this context, or if you just don't understand who said what.

                Either way, weak sauce. Do better. Shouting "sophistry" when you lack understanding doesn't even convince me you know what that word means. Maybe use smaller words, so that you can arrange them in a way that makes sense?

                And you're right, you didn't introduce the subject of gaslighting; I did. Are you trying to gaslight me, or di

          • This is what I was referring to: "She'll barely tell me when breakfast is ready."

  • I used the provided link to "download all your data [google.com]" and had it save a "takeout" ZIP file on my Google Drive. I then tried adding a few files to drive and removing them then "really" removing them. In both cases a "removed" file (in the Trashcan but not "really" removed) did not appear in the Takeout archive. I then created a new Takeout archive and had it send it as an email to my gmail account. In both cases it's everything from my drive, calendar, all emails, contacts, bookmarks, photos, etc.

    In the expanded ZIP under the root "Takeout" dir there's an "index.html" with details on all the files. The 2nd archive i created even contained the first archive in it's entirety from the "Takeout" folder on my Drive.

    Are you seeing something other than this?
    • by rot16 ( 4603585 )
      This list is missing your tracked browsing history. For Android users there is GPS tracking history and call and SMS history.
      • So far in my explorations of the data I haven't seen any browser history data, though I strongly suspect the google is collecting it. Are you saying that it isn't anywhere in the archive? Is the google claiming that this is some sort of derived information that belongs to the google, not me?

        My hypothesis is that it's in there somewhere in some form, but I just don't know how to look for it. I certainly can't prove it isn't there.

        • by swillden ( 191260 ) <shawn-ds@willden.org> on Thursday April 05, 2018 @08:57AM (#56386035) Journal

          So far in my explorations of the data I haven't seen any browser history data, though I strongly suspect the google is collecting it

          Unless you have web history enabled (check the settings in myactivity.google.com), I'm quite certain Google is not storing your browser history. I think this is a distinct question from tracking your web browsing through Google Analytics, assuming you haven't opted out of that. In the latter case, Google gets information about the sites you visit from those sites and uses it to update your interest profile, but doesn't store the actual visit history.

          Note that there is almost certainly data Google has about you which it cannot show you, because it can't be 100% certain that you are you. Data derived from logged-out interactions can be tentatively correlated with you, but since there's no way to be completely certain you're the same person, it would be a violation of the privacy of whoever actually had that logged-out interaction (which might be you) to show it to you. In the case of logged-in interactions, of course, it's reasonable to presume that anything done while logged into account A can be safely shown to account A.

          • by shanen ( 462549 )

            Mostly I can only address one part of it, which is the "can't be 100% certain that you are you". The takeout website checks your password at several times in the process. I was actually surprised that there didn't seem to be any option to encrypt the file.

            • Mostly I can only address one part of it, which is the "can't be 100% certain that you are you". The takeout website checks your password at several times in the process. I was actually surprised that there didn't seem to be any option to encrypt the file.

              Checking your password only helps ensure that the person getting the data that is known to be associated with the account actually is. It doesn't help with data that is only thought to be associated with the account. The uncertainty is in the connection between the data and the account, not between the account and the account owner.

              • by shanen ( 462549 )

                Thanks for the clarification. I should have also been more clear that I don't regard passwords as secure.

                Actually, I think the notion of identity is key, and I actually advocate for the use of EPR (Earned Public Reputation) as a way to manage time and filter out such annoyances as ACs. Again, that's part of my framing around this entire topic... In theory the google data should include both private and public parts, but right now it's just an obscure mishmash and this story/discussion was in general not ill

      • Re:I got a ZIP file (Score:4, Informative)

        by swillden ( 191260 ) <shawn-ds@willden.org> on Thursday April 05, 2018 @08:49AM (#56386013) Journal

        This list is missing your tracked browsing history. For Android users there is GPS tracking history and call and SMS history.

        If location history is turned on, it should be there in the download. Mine is.

        SMS messages are not uploaded to Google, unless you're using Hangouts for SMS (which you can't do anymore unless you're using Project Fi as your carrier). Many people wish SMS were backed up, so that it could be restored onto a new device. As it is, when you get a new phone your SMS history is lost unless you copy it across to the new device (which recent Android versions will automate for you).

        FWIW, Android P is enabling Android backups to be encrypted in a way that ensures that Google cannot read them. That will in turn enable more data (like SMS, I'd expect) to be backed up and restored since it won't raise privacy concerns.

    • by shanen ( 462549 )

      I downloaded my data in 2-GB pieces. One of the mysteries is that the last two pieces were in total less than 2 GB. Each of the pieces contains a number of folders, many of which have the same names. There is only one index.html file in the last piece, but it does not work the same way as the Facebook archive you can download. I've been poking at the data in various ways, but so far haven't been able to make hide nor hair of it.

      Perhaps it will be helpful to consider another version? This one is from IBM and

      • I dunno.

        I copied the text of DJT's speech to the Conservative Political Action Conference (CPAC)

        https://www.vox.com/policy-and-politics/2018/2/23/17044760/transcript-trump-cpac-speech-snake-mccain [vox.com]

        and got the bizarre result quoted below. Which is very similar to the result I got when I typed in some of my own text. And similar also to their sample text from a snippet of Barack Obama's statements in the 2012 election debate.

        I think that the "personality-insights" site is kinda BS. Or maybe we're all just s

        • You only looked at the first part of the results? However, I think what you saw from Personality Insights was an example of GIGO. You picked problematic input. Not just the effect of multiple authors, but also Trump's YUGELY garbled delivery of whatever he was supposed to say mashed into whatever popped into his head from moment to moment. Largely incoherent input, and yet some parts of the results make sense. Empathetic? Yes, but in a twisted way. I actually think that Trump is strong on the "humanist" dim

    • by shanen ( 462549 )

      I'm still trying to consider the differences between what you received in one gigantic file versus the smaller pieces I received... I feel my earlier response was not helpful.

      Let me say that my original idea about the structure is definitely false. I speculated that the links in the index.html file would include relative references to the component files. That is NOT the case. I was even reduced to searching the google's documentation for such information.

      Now you have me speculating that the redundant files

  • First of all, thanks to all the people who have provided thoughtful or useful ideas. I'm about to make the attempt to read everything (except for the ACs, and I'm even considering looking at them this one time), but right now I want to add a few thoughts from my early reflections on the first comments I saw... I'm going to put them in the form of additional questions I wish I could answer from the voluminous, even overwhelming data that the google sent me:

    (5) Where is the evidence that I'm a good person who

    • "(5) Where is the evidence that I'm a good person who deserves more success?"

      You must look into yourself for this answer. Life is random, but the way you react to it you have some control over. If you can't give yourself an honest opinion on this matter, ask your wife.

      ("6) Where is the proof of what a prick I am?"

      You seem to have confused google with the ghosts of christmas past. I don't think it works that way.

      "(7) Is there anything in there that I should actually be afraid of? "

      Are you ashamed of somethin

      • by shanen ( 462549 )

        Well, I think it's a thoughtful response, but not to any of the questions I was actually asking, and it doesn't seem to be worth the effort required to reinterpret them from your twisted, possibly ad hominem, perspective. Even less so since I'm certainly willing to concede the possibility that it is MY perspective that is the twisted one. Instead let me try to answer you [n3r0.m4dski11z] in terms of freedom, which, per my primary sig, is actually my overriding concern. To do that I had better include the la

  • by Anonymous Coward

    2. Google doesn't have all that data unified. The takeout project is actually the most unified view of your data.
    3. Googlers in general doesn't have access to your data. Systems do, and use it in an automated fashion. There are break glass access for some engineers for some types of troubleshooting - but this triggers alarms.

    In general, during my > 5 years at Google, I realized it's a company I'll trust with my data for many years to come. The "Data Liberation Front" who ensures that data takeout is a

    • by Anonymous Coward

      More answers from a current Google engineer.

      You can see what Google has concluded about you for the benefit of advertisers here:

      https://adssettings.google.com

      You can also turn off interest-based advertising and remarketing if you like.

      The definitive way to see what Google does with your data is to open an AdWords account, which just takes a few clicks. You don't have to spend anything to see what's on offer. You can examine all the ways to target ads and judge for yourself if it's okay. I haven't seen anyth

  • by mrwireless ( 1056688 ) on Thursday April 05, 2018 @06:27AM (#56385651)
    The main thing to understand here is that there are two types of data:

    - Your raw data
    - Their 'derived data'

    This 'Derived data' (as the databroker industry calls it) is where the real value is. These algorithmically formed 'opinions' about you are the valuable distilled product they sell. In the USA this derived data doesn't belong to you. It's protected as a form of corporate free speech.

    In the EU this is a little different, as these 'opinions' are also considered personal data. The question is to what extent you get access to it. For example, the threshold for personal data is when a piece of data can be traced back to less than 11 people. So the trick here is to create opinions about small groups of which you are a part. For example: knowing that someone with cancer lives in one of three adjacent houses, that is not considered personal data.
    • by shanen ( 462549 )

      If I ever had a mod point to give, you'd get my "insightful" vote. However I feel like I've already responded to the points you raised in the context of imagining that the google had a real competitor in most of the areas where the google makes money.

  • Or maybe there is no answer along the lines I was seeking?

    Anyway, I do want to thank the constructive contributors, even though I didn't learn the kinds of things I was hoping to learn. I did learn a few new things and got a few new ideas, but mostly I feel like I framed the topic incorrectly. Is it evidence of too much Japanese influence to feel like an apology is in order?

    However, Slashdot marches on, and this "story" has pretty much expired already... The google and our private data held by the data is n

"If it ain't broke, don't fix it." - Bert Lantz

Working...