Skip to main content

View Diary: Data Storage and the NSA (121 comments)

Comment Preferences

  •  The one thing that saves us (4+ / 0-)
    Recommended by:
    JDWolverton, Dragon5616, Einsteinia, Cliss

    is that it's nearly impossible to search. They want to find all emails with "Occupy Wall Street" in the body? No problem, except there's 100 bazilliion of them, and 99% are totally irrelevant to whatever they're plotting. Who's going to read them all and sift out the intelligence?

    Early to rise and early to bed Makes a man healthy, wealthy, and dead. --Not Benjamin Franklin

    by Boundegar on Mon Jul 01, 2013 at 08:42:18 PM PDT

    [ Parent ]

    •  Not True (8+ / 0-)

      This is just a gross misunderstanding of how far search algorithm, data aggregation and optimization have evolved to-date. Analytics is no longer "black magic". If it were, "Big Data" would be useless and there will be no Google or the likes.

      •  If analytics was that good, (1+ / 0-)
        Recommended by:

        Youtube wouldn't keep auto-killing completely original videos for copyright infringement. At the end of the day, a human still needs to evaluate whatever the algorithm spits out, and with a universe of trillions of voice calls, even a few false positives adds up to a mountain of confusion.

        I use Google Voice in my work, and even though the algorithms must be as Awesome as Google Itself, it still can't transcribe more than every third word correctly.

        However, thank you for implying I find technology as confusing as black magic. A little humility never hurt anyone.

        Early to rise and early to bed Makes a man healthy, wealthy, and dead. --Not Benjamin Franklin

        by Boundegar on Wed Jul 03, 2013 at 05:42:21 AM PDT

        [ Parent ]

    •  No, but this "Hoovering" (4+ / 0-)
      Recommended by:
      JDWolverton, Kombema, StrayCat, Cliss

      is not only a vacuuming up EVERYTHING, but also it makes it possible for J. Edgar Hoover wannabes to blackmail and/or cherry pick for prosecution anyone they want to target.

      Not only that there is the pure intimidation and chill on investigative reporting and protesting.

      Separation of Church and State AND Corporation

      by Einsteinia on Mon Jul 01, 2013 at 11:24:49 PM PDT

      [ Parent ]

      •  Cherry picking the data to target inconvenient (8+ / 0-)

        people is what I see as a natural out come of this data collection. I know it's used retrospectively. It makes sense that it was used after the Boston bombing to catch the Tsarnaev brothers, but to me this isn't enough.

        I wonder how many people have had this data used against them for like you say, pure intimidation. I'd add black mail and political prosecution too.

        If a nation expects to be ignorant and free, in a state of civilization, it expects what never has and never will be. Thomas Jefferson

        by JDWolverton on Tue Jul 02, 2013 at 06:24:10 AM PDT

        [ Parent ]

        •  That's what J.Edgar Hoover certainly used it for. (3+ / 0-)
          Recommended by:
          StrayCat, Cliss, JDWolverton

          Blackmailed the presidents of the United States, and anyone else who might have suggested bringing him down. Absolute power corrupting absolutely.

          "Government by organized money is just as dangerous as Government by organized mob." -- Franklin D. Roosevelt

          by Kombema on Tue Jul 02, 2013 at 11:07:57 AM PDT

          [ Parent ]

        •  Always Degenerates to That (2+ / 0-)
          Recommended by:
          Cliss, JDWolverton

          Eventually aimed at political opponents, never fails.  Our founding fathers were prescient.

          I'm not making a lot of friends both at home and at work because I'm a 4th amendment absolutist.  When the government tells us we shouldn't worry about their surveillance methods, it's all about keeping us safe, guess what... time to worry like hell.

          Why the fuck the entire congressional majority can get away with being 2nd amendment absolutists while at the same time collectively pissing on the 4th is a mystery to me.

          Whatever the NSA is blathering on about, metadata this, metadata that... story of the day.  They're fucking LYING!!!   Fact: If the NSA can lie to congress without blinking, then they will lie to us all with even less impunity.

    •  Machines read them (6+ / 0-)

      Machines are quite capable of reading words, deriving grammar, and determining the semantic meaning of sentences, paragraphs, or the e-mail or document as a whole.

      This goes back almost 25 years now, but a company I worked for back then had a whole team of linguists and speech experts that devised algorithms to break down "written" data whether structured (from known fields on a form, for example) or unstructured (free form, like these comment blocks or diaries), and be able to derive slang, grammar, roots, and semantic meaning.  This was done to automate translation and be more "intelligent" in understanding search requests, and the like.

      The other thing to keep in mind is that despite the number of words in just the English language, for example, we use only a small fraction of them commonly. This is why you see very compression results when ZIPing text. Even though there is a magnitude of order larger number of proper nouns than true words, in English for example, those words tend to trend with names of businesses, locations, and people often being discussed based on current events as just one instance. If we know even one other piece of information such as the sender or recipient or their respective locations the scope or universe of proper nouns becomes smaller.

      Dialects, colloquialisms, and slang are known and can be codified for meaning (i.e. translated as if you used "proper" English instead). So again, if you know the identity or location you can improve the quality of understanding.

      Of course, the same can be done with voice processing. The differences between the way people speak and the way they write are known. Though voice processing is much more complex due to accents and impediments for example, or even more likely, regional or local "speak".

      Nothing is perfect so while most communications can be processed automatically with little misunderstanding there is likely a significant number of communications that would be kicked to crypto-analysis to try to determine if a code is being used. This process would still be highly automated where it would be looking not only at meanings but relationships to known events, past, present, and future as well as other attributes of those events. It may look at movie or tv references, for example, are you talking about an episode of "24" or something else.

      Naturally if you have any known identities the communication can be combined with other information such as shopper card information or credit/debit purchase information to help identify if you are using a code. For example, if you were to frequently say, "I have to go to the store to get milk" in your communications that might be a code despite that being a very common everyday thing. When was the last time you purchased milk? What quantity did you buy, etc.?

      All of this can happen in automated systems before it  gets near a human analyst, if ever.

    •  Who needs a they, when you got algorithmns ? (3+ / 0-)
      Recommended by:
      JDWolverton, ColoTim, Cliss

Subscribe or Donate to support Daily Kos.

Click here for the mobile view of the site