Most of the data collected by urban planners is messy, complex, and difficult to represent. It looks nothing like the smooth graphs and clean charts of city life in urban simulator games like “SimCity.” A new initiative from Sidewalk Labs, the city-building subsidiary of Google’s parent company Alphabet, has set out to change that.
The program, known as Replica, offers planning agencies the ability to model an entire city’s patterns of movement. Like “SimCity,” Replica’s “user-friendly” tool deploys statistical simulations to give a comprehensive view of how, when, and where people travel in urban areas. It’s an appealing prospect for planners making critical decisions about transportation and land use. In recent months, transportation authorities in Kansas City, Portland, and the Chicago area have signed up to glean its insights. The only catch: They’re not completely sure where the data is coming from.
Typical urban planners rely on processes like surveys and trip counters that are often time-consuming, labor-intensive, and outdated. Replica, instead, uses real-time mobile location data. As Nick Bowden of Sidewalk Labs has explained, “Replica provides a full set of baseline travel measures that are very difficult to gather and maintain today, including the total number of people on a highway or local street network, what mode they’re using (car, transit, bike, or foot), and their trip purpose (commuting to work, going shopping, heading to school).”
To make these measurements, the program gathers and de-identifies the location of cellphone users, which it obtains from unspecified third-party vendors. It then models this anonymized data in simulations — creating a synthetic population that faithfully replicates a city’s real-world patterns but that “obscures the real-world travel habits of individual people,” as Bowden told The Intercept.
The program comes at a time of growing unease with how tech companies use and share our personal data — and raises new questions about Google’s encroachment on the physical world.
Last month, the New York Times revealed how sensitive location data is harvested by third parties from our smartphones — often with weak or nonexistent consent provisions. A Motherboard investigation in early January further demonstrated how cell companies sell our locations to stalkers and bounty hunters willing to pay the price.
For some, the Google sibling’s plans to gather and commodify real-time location data from millions of cellphones adds to these concerns. “The privacy concerns are pretty extreme,” Ben Green, an urban technology expert and author of “The Smart Enough City,” wrote in an email to The Intercept. “Mobile phone location data is extremely sensitive.” These privacy concerns have been far from theoretical. An Associated Press investigation showed that Google’s apps and website track people even after they have disabled the location history on their phones. Quartz found that Google was tracking Android users by collecting the addresses of nearby cellphone towers even if all location services were turned off. The company has also been caught using its Street View vehicles to collect the Wi-Fi location data from phones and computers.
This is why Sidewalk Labs has instituted significant protections to safeguard privacy, before it even begins creating a synthetic population. Any location data that Sidewalk Labs receives is already de-identified (using methods such as aggregation, differential privacy techniques, or outright removal of unique behaviors). Bowden explained that the data obtained by Replica does not include a device’s unique identifiers, which can be used to uncover someone’s unique identity.
However, some urban planners and technologists, while emphasizing the elegance and novelty of the program’s concept, remain skeptical about these privacy protections, asking how Sidewalk Labs defines personally identifiable information. Tamir Israel, a staff lawyer at the Canadian Internet Policy & Public Interest Clinic, warns that re-identification is a rapidly moving target. If Sidewalk Labs has access to people’s unique paths of movement prior to making its synthetic models, wouldn’t it be possible to figure out who they are, based on where they go to sleep or work? “We see a lot of companies erring on the side of collecting it and doing coarse de-identifications, even though, more than any other type of data, location data has been shown to be highly re-identifiable,” he added. “It’s obvious what home people leave and return to every night and what office they stop at every day from 9 to 5 p.m.” A landmark study uncovered the extent to which people could be re-identified from seemingly-anonymous data using just four time-stamped data points of where they’ve previously been.