Reality Check: DataCon LA 2018
Lessons extracted from the conference hype and jargon
A few days ago, the University of Southern California swarmed with a mix of curious and complacent tech enthusiasts. A flood of newcomers overwhelmed the poorly organized volunteers. Badges and lunch delays plus a spontaneous rebranding(?) of “Big Data Day LA” into “DataCon LA” halfway through left the crowds confused and bonding in the awkwardness.
It was a public-tier event that answered the question we at Future Sin care about:
What are data companies telling the public?
We came ready to read in-between the hype.
It was apparent the event was focused on moving people through rather let them swim deep. The unexpected scale and volume of new people left many session classrooms filled to the fire code limit. Corporate workers came to the conference as a checkmark on their “education checklist.” Vendors pitched and recruited data scientists and engineers. Talking points slipped off the tongues of marketeers in patient repetition.
How many times was the word “data” and its jargon derivatives used? I lost count. It became meaningless. In fact, I started blocking it out and substituting in “it” like an active translator in a foreign land of limited vocabulary.
As aspiring futurists who have been through a few tech cycles ourselves, we stuck to our intention of sifting through the hype.
I present my field notes and questions from the sessions, speakers, and conversations from the long day of “big data” talk:
Megan Risdal from Kaggle, keynote on Collaborative Data Science.
“No one beginning a new data project should start from a blinking cursor”
Thoughts: Megan was a ball of witty and geeky energy — by far the most engaging and intense keynote at the conference. Kaggle, her company’s data platform, has an intense amount of experiments— what other science allows for this much experimentation? Digital bytes, twins of physical models, and simulations scale faster and deeper than any physical science efforts.
- Kaggle has a community of 2MM+ data scientists and hold competitions, supplies tools to run data science experiments and share results + code as open source tools.
- 1.3 MM submissions from their competitions. 184k open source kernels. (aka they move a lot of science and data)
- By sharing new packages and support on the forums, leading ML algorithms like XGBoost develop and are tested, spread as a standard through social validation.
- often “assumptions” of data are shared in lieu of actually sharing data for fear and privacy/proprietary reasons
- Features: Templatization (of code, Kernels) and Documentation (lab notes)
- Meta-Kaggle is the public repo of Kernels to launch new projects
Ken Wiener — from GumGum, Computer Vision platform
“The Internet of Eyes will come right after the Internet of Things”
Thoughts: Plenty of cringy and controversial points. The primary aim of GumGum’s AI is analyzing advertising — always a reluctant “value” to businesses. They have to make money, and it seems they’re doing just fine by making sure we can’t ignore ads by blending content and advertising.
A picture of Bo Jackson at the top of an article on his multi-sport skills turned into a Toyota ad right before our eyes as cars and colors invaded the space around him and a pair of sunglasses appeared on his 1990 face.
So, Bo knows baseball and football…and the new 2018 Toyota hybrid? Gee, thanks.
- GumGum’s AI can analyze physical ads during sports events, track impact, detect brands in public images (social media posts) and compile analytics based on usage of images:
- RGB among other analysis layers, is becoming a standard starting point for visual analysis.
- The integration and dependency, the blurring of ads and content. (Bo Jackson/Toyota example)
- Good will/ health tech project: Tracking dental x-rays, cross-comparison to pathologies and tooth changes
Gwen Shapira, Chief Data Scientist at Confluent
“Data is a constant stream of events. It never stops.”
Thoughts: AI data analysis looks for changes. It rarely is used for understanding equilibrium points, only when a system strays from these seemingly arbitrary reference points.
- Architecture in cloud must reach across multiple data centers- flexible and use easly scalable computing resources
- Cloud Native Data Pipelines. Microservices reach down into each data center
- DaaS — “Data as a Stream”
- Build Insert, update, delete, ONLY for the change events in a database
Justin Herz of Warner Bros — yeah, that Warner Bros
“Data science is a religious conversion process” for big, old companies(like WB)
- Data science validation metrics are not enough to cause human adoption of data science in mature companies
- The most accurate data science techniques are the least explainable, a Black Box, especially to old mindsets. The religious conversion process is best aided by visualization
- Malware detects propagation of malware with a geospatial map of threats and remediation as they arise.
- Full corporate slide presentation here
Diego Saenz from Accenture
“Most of the good books on AI architecture have not been written yet”
- AI is not a bolt-on to an existing tech stack. Must be AI-centric, redeveloped.
- What are the data stories like user stories?
- Devops redefined to accommodate the range, flexible outputs of ML
Sari Ladin-Sienne, Chief Data Officer for the City of LA
“You pay for it. It’s your data.”
- Priority: build out an open civic data platform with strong APIs
- Data makes services responsive and equitable,strip away the assumptions and show how/where they are going
- Example: notifications for street sweeping, before you get a ticket
- Full App Suite incl. the “Mayor’s Dashboard” — like a CEO’s analytics
- GEOHub = geospatial, locally tracked civic engagement
- OpenBudget=transparent spending, trying to visualize where tax money flows in a city
- Streetwise= what’s happening as resources are spent in street projects
Alyssa Columbus, NASA ‘Datanaut’
“Data Leads but Story Tells”
- Alyssa acknowledged how the poor presentation of critical numbers did not clearly show the o-ring failure signs in tests leading up to the NASA Challenger disaster in 1986 (failure in 30 degree weather). If they could have seen the failures and anomalies clearly on the diagram report, they never would have allowed launch.
- Data Journalism = Data Storytelling. The only way to create change.
- NLP (Natural Language Processing) to tell stories w/ story-like English flow. AI can generate human storytelling for automated data translation
A Data story, like statistical analysis:
- Outlier exposures
- What’s Trending
- Forecasting the Future
Random big data milestone: Fortnite had 127MM players in one session on its last tournament. It moved 12 terabytes/sec of data across the world.
That’s my roundup. If I heard “data [insert jargon]” or “data is [simple conclusion]” one more time, the buffet lunch would have come up. There are a few sessions that triggered deeper and more intense thoughts on technology intentions, coming soon in their own dedicated post.