Event Recap

Event Recap: Big Data, Bigger Questions: Using Data for Good

Oct 28
Alley Team
Event Recap

Event Recap: Big Data, Bigger Questions: Using Data for Good

Oct 28
Alley Team
Event Recap

Event Recap: Big Data, Bigger Questions: Using Data for Good

Oct 28
Alley Team
Event Recap: Big Data, Bigger Questions: Using Data for GoodEvent Recap: Big Data, Bigger Questions: Using Data for Good

We’ve grown skeptical of platforms, software, and technology that collect our personal data. Over the years, our trust has been broken many times over, and our data has been used to fuel the fire of misinformation, undermine democracy, and disrupt daily life. What doesn’t make headlines is how data, when used for good, can empower citizens and support communities.

In this panel, we hear from industry experts about the expectations vs reality of big data. We learn more about data activism, where the movement is headed with the arrival of 5G and Massive IoT data collection, and how communities and governments can assemble to protect our personal information, leveraging it for greater good. When harnessed responsibly, data has the power to spur civic engagement and transform communities, but first it needs to become open and accessible to all. Our panelists will cover the role data will play in the upcoming election, and the responsibility that corporations and government must adhere to when collecting it. Tune in to learn more about the future of data collection and usage, how it can empower communities, and find out how you can get involved to support data for good.

Verizon 5G Labs:

Verizon's 5G Labs works with startups, academia and enterprise teams to build a 5G-powered world. We work on 5G trials, hackathons, industry partnerships, prototyping challenges and more.

OUR PANELISTS:
Pascal Corpet
Bayes Impact
Sarah Agopian
Alley

TRANSCRIPT:

Sarah Agopian  0:06  
Hi, everyone. Thank you so much for joining us on your Wednesday afternoon. My name is Sarah Agopian. And it is my honor to be your host for today's conversation, Big Data, Bigger Questions Using Data For Good. I am a program strategist here at Alley and Alley is a community agency that unites rich and diverse communities around the country with corporate partners to provide the resources and catalysts to drive positive change in technology and the broader world. We're excited to host this event as part of Verizon 5G Lab series. For those of you who aren't familiar with Verizon 5G Labs, they work with startups, academia at enterprise teams to build a 5G powered world using the practical application of emerging technologies. Part of that mission includes having conversations like these that address barriers to digital inclusion and create opportunities for communities to thrive and grow. If you're interested in learning more about their work, visit verizon5glabs.com. And without further ado, the reason you're here today, I want to turn it over to our incredible panelists to introduce themselves and share a bit more about what they're working on and why this topic is important to them. Pascal, why don't we start with you.

Pascal Corpet  1:29  
Hi, everyone. Thank you for joining. So I'm Pascal Corpet  and the CTO of Bayes Impact. We are a nonprofit organization. And we work with big data and AI to empower people at scale. We're a small team of a dozen people. And we're all driven by social impact. And we create tech product to help underserved populations. And so I'm happy to be here because data for good is exact field and reason that pushed us to create Bayes Impact in the first place a few years back.

Sarah Agopian  2:00  
Amazing. Thank you, Linda, what about you?

Linda Ge  2:04  
Hi, everyone. I'm Linda Ge and I'm a product manager at Citi Ventures which is the venture and innovation arm of Citi. My group Studio is an incubator developing new products that drive economic vitality for people and communities. I work on City Builder by Citi, which is a free data driven platform that aggregates information about communities to support place based investments. And through data, mostly federal open datasets, we hope to tell the stories of places that may not otherwise receive media or investor attention. I'm excited to join this conversation on how we can use data well to benefit everyone.

Sarah Agopian  2:41  
Thank you, Linda, and Drew. Last but not least,

Drew Zachary  2:46  
Hi, everyone. I'm Drew Zachary and managing director of Census Open Innovation Labs and one of the co-founders of The Opportunity Project. Census Open Innovation Labs is an innovation unit at the US Census Bureau, which is part of the Commerce Department who refer to themselves or ourselves as America's data agency. So we are all about creating datasets that are useful to the public and my team. We develop approaches for all different kinds of people to come together and collaborate to use open data for good. So, it is also exactly the reason that I founded these programs, much like Pascal and have had the opportunity to work with wonderful people like Linda through those and the one that I'll probably talk about most today is The Opportunity Project, which is our kind of our flagship innovation program. It operates like a tech accelerator where we kind of source really important public problems like homelessness or investment in community economies. And then we identify tons of really talented people from outside of government to work together and help solve those problems through data driven technology.

Sarah Agopian  3:57  
Awesome. Well, it sounds like we got the right panelists to be part of this conversation, which is definitely a win on our side. Before we really kick things off. Just a reminder for our audience, please make sure to drop your questions in the Q&A feature. We will be spending time at the end answering your questions. So don't be shy. Definitely drop those in. All right, you guys, big data. It's a hot topic right now, especially with documentaries, like the Social Dilemma of recently coming out. And then obviously the 2020 election. I'd love to kick things off by hearing all your thoughts on the intention versus the reality of big data. When you think about this, what sticks out to you the most. Linda, I would love to start things off with you.

Linda Ge  4:45  
Thank you, Sarah. When I think of big data, personally, I think about predictive analytics models. So these models have been well intentioned to help make people's lives easier by automating a lot of decision making, as well as curating the content that works. Supposed to enter tech daily lives. However, between intention and execution, I think there's still much of education to be done when collecting and analyzing data, especially in regards to addressing our own internal biases. So for that reason, we've been pretty cautious on the City Builder platform in employing any predictive analytics models. We don't have any of them on the platform, and we are at the stage instead seek to present data in an understandable and actionable way to our users.

Sarah Agopian  5:35  
Awesome. Drew, what about you?

Drew Zachary  5:39  
So I think Linda, you know, thinking about that, in terms of curating the content that we consume on a daily basis really resonates because actually, outside of The Opportunity Project, another another program that we stood up was called Census Accelerate, and was kind of based exactly on one of the unintended consequences of that relating to the 2020 census, which hopefully, everyone has responded to, but if you haven't, please hurry up and do it today. You can do it online. But yeah, part of the problem was that, you know, there is so much false and inaccurate content that was being put forward about the 2020 census, and really big data was enabling that to be curated and put in front of people in ways that are so far beyond anyone's control. And so, you know, we set out to develop an open innovation process where we could create grassroots content that was positive and accurate and really resonated with people and kind of counterbalance that, you know, all of the data that's out there, you know, that maybe has really harmful unintended consequences and as contributing to missing disinformation and being, you know, like, promulgated through these large platforms. And so our question was, how do we get people to create their own content that addresses these content voids and gives people correct information that really resonates with them. So I think there's an opportunity there, too, if we can really understand how big data is playing out and the way that people are consuming it daily.

Sarah Agopian  7:14  
Yeah, sure.

Pascal Corpet  7:16  
And then, yeah, I totally second that, I think that is, big data seems like difficult to handle, when you're far in sync, it's going to do a lot of thing by itself. But most of the time is not spent, for at least on my side not spent imagining how to do the data or how to collect more data, but how actually to understand how it is behaving by itself, like controlling the data quality, or like Drew was saying, maybe ask people to create their own content? Well, before you send this content to somebody else, you need to make sure it's it is relevant, and it is consistent. And this is a lot of things I've discovered in the reality of big data, it's that half of the job is using it half the job is controlling it. And it's also we've seen that a lot with different scandals recently, where engineers know how to prepare and didn't think about controlling it. When you help and serve people that would not be able to control themselves, you have a double duty as this is what strikes me the most when I think about the reality of my big data.

Sarah Agopian  8:20  
Yeah, for sure. I know, it's something very topical, especially access to the right kind of data. And so I know that we're going to get into that later in the conversation. But before that, I know I would love to hear more about and you guys kind of already touched on this, but more about the product, the program drill that, you know, you guys have developed, what problem you're trying to solve, and then how more specifically, it leverages data for good. Drew, I'm going to start with you. And then Linda and Pascal. I know we have some slides for you, so that our audience can actually see some visuals, which is very exciting. So, Drew, we'll start with you.

Drew Zachary  9:03  
Awesome, thank you. And our website, I'll just because I always forget to say is opportunity.census.gov. And you can go there, read all about this and see 100 examples of products. But I want to talk through those quickly and hand it over to Linda if that's the next speaker because she has an awesome product. And really the only reason we're here is so that we can enable teams like Linda's. So, imagine yourself a federal agency, you know, many thousands of person bureaucracy and you identify a problem like hey, you know, there's a really great innovation economy in New York, Boston and San Francisco, but what about Highland Indiana or New Orleans? You know, is there a thriving tech innovation ecosystem there? There's probably not and, you know, in government, we are trying to come up with new approaches to combat that. So rather than, you know, taking a traditional approach of trying to build something ourselves or just use policy, my team and the opportunity project, we're here to say, let's open source this problem and say there's millions of people in the country who can come up with better ideas than we ever could, if we point them in the direction of this problem and give them some open data and access to experts and collaborators. And that's really what we do is we source these problems. we curate datasets from the Federal Open Data space that are highly relevant. Cut through all the bureaucracy and red tape of, you know, sorting through those datasets. And we just say, Here's 50, that we think are highly relevant. Here's 50, experts who are available to help you with that, you know, data processing, and then we turn it over to the teams and let them drive the solutioning. And so we have the great opportunity to work with Linda's team. Last year when we did this investing in the workforce in all American communities, Tech Sprint. And there were four really related problems around entrepreneurship and catalyzing an innovation, economy, talent discovery, and things like investing in opportunity zones. And really the kind of bottom line of the challenge there is if we have a policy that's intended to create innovation opportunities in communities that have lacked that in the past, how do we really help people take advantage of that and get investment that's going to last and really benefit the residents of those communities. And so we partnered with HUD and the Council of Economic Advisers on framing that problem, then we opened it up to incredible people like Linda, and out came this really awesome products. And that, you know, that's kind of how it works is we just turn this problem over to people and they build amazing tools with open data.

Sarah Agopian  11:48  
What a concept so good. Um, well, Linda, I feel like you got a nice little cue up. So you're next.

Linda Ge  11:56  
Thanks, Drew. And Thanks, Sarah. If you can see my slides now.

Next slide, please.

City Builder is a platform for exploring place based investments with the goal of improving economic vitality for communities. And we focus especially on those without robust city planning resources. We had a great experience last year at The Opportunity Project sprint. And by partnering with both representatives from The Opportunity Project, as well as the user advocates, we were able to build this platform. We use Federal Open Data to showcase investable needs in a consistent and objective way, on behalf of all places in the US. So, our target user persona is the investment decision maker, which we define as people or groups with capital to invest motivated by a combination of financial returns and social impact. We partnered with municipalities as their the main stakeholder and community development. And this example shows Houston which is a city that we've worked closely with as we developed our minimum viable product. Our filters for census tracts are populated by different open data sets. So, this filter that I'm showing takes the US Department of Agriculture's definition of food deserts, which are areas that don't have fresh grocery stores nearby, and it maps them. In the next slide, we can see demographic information about an area's residents. And we can also click into a project that the Houston government has prioritized. They have an initiative called Complete Communities, which seeks to improve neighborhoods so that all of Houston's residents and business owners can have access to quality services and amenities. From there, the user can choose to contact the listing service city representative for more information. And of course, if the user feels inspired to invest in Texas generally, and they can navigate to the qualified opportunity funds directory on our site. Learn more about a given fund and contact them about making an investment.

Sarah Agopian  14:10  
Awesome, thank you so much, Linda. And Pascal. I think we're gonna hear more about Bob.

Pascal Corpet  14:15  
Yep. Happy to tell you about Bob. So there's a main product I'm working on right now. It's called Bob. And it's a it's it is using data to help unemployed people in France. So my products were smoking friend today is their job search. So it's not about finding the perfect job for you because first doesn't really exist. And we try to empower you not to enslave you by choosing for you. And it's also about it's about helping each individual to understand their their own situation. So next slide. What is the main blocker for them? And what, what strategies and what tools exist to help them. So, behind the scene, Bob is crunching labor market information, to deliver information that most people don't know, because they, they might see job posting online about what availability offers, they can see but they don't know about the competition. They don't know what who is searching is it as a secret that they are searching, and sometimes, especially in underserved communities, say don't truly know that how to to search for a job. They can see many things online, many tools, but getting help, like what, what I would do by going to a friend, maybe, who's been through the same process, it might not have the opportunity to have that around them. So Bob is this kind of friend that knows a lot of things and try to adapt everything to, to one person to each person.

Sarah Agopian  15:46  
Great, thank you. And Pascal, since we're, we're talking to you, I would love to hear more just about how your previous work experience influenced Bob or anything else that you're working on?

Pascal Corpet  15:58  
Sure, yeah. So before creating Bayes Impact, I was working at Google, Google Maps in analytics more precisely. And so the data I was crunching was the user using Google Maps every day was generating a ton of data that was another thing for Google Maps internals. And what I realized quickly is that you don't have to invent something very complex, that only creating and showing this data in a proper way and making asking the right question to the data would actually deliver a lot of value. And this is exactly what we've tried to do with Bob, we're not trying to invent something or to find an insight that will be completely mind blowing, but just deliver a small piece of information that is just right what you wanted right time. This is actually what the big big companies have done for a long time now where they use big data and AI in more and more complex way but we were already blown apart blown surprised by what they did with simple things. And this simple things can already be applied for a social impact. And this is great, because it's kind of a field which it has not been used, where we haven't used the data and big data and AI enough. So there's a lot of simple solutions. And can have a great impact quickly. Maybe in 10 years, we will need to have very complex solution. And when we need to hire PhDs, like like the big game do. But for now, it's this is something that is super nice for me, because I can try to do simple stuff and directly help people.

Sarah Agopian  17:36  
Yeah, it's really powerful. And I know we'll dive more into machine learning and AI later. Kind of just moving on in our conversation drew and Linda, I know that you guys, you know, talked about City Builder being part of one of your sprints. And Drew, you kind of gave more context around that. But I would love to hear if you guys can just share more about this experience. And Drew, I mean, this process that you guys have created. Why is it successful? Like how has it worked? I know that when we talked earlier on, you know, it wasn't something you guys got fully right the first time and it's been a process.

Drew Zachary  18:16  
Yeah, absolutely. So, The Opportunity Project is coming up on its five year anniversary this December, and we've run 11 sprints and have had, as of last year 100 products come out of the process, Citi Builder being one of them, and actually one of our first financial prize winners. So it's been a long ride for a government innovation program. Five years is a long time. And yeah, I think what's made us successful is operationally kind of having a culture of experimentation and iteration, we ourselves very much approach that kind of MVP mindset of, you know, do the best version of something that you can quickly and test it and see if it works and be kind of both scientific about it, and kind of ruthless, and if something doesn't work, we're the first ones to, you know, suggest changing it. And we have a really open culture that on our team. And then in terms of the process itself. I would say, you know, to this day, we get the reaction of 'wait a minute, are you guys really the government?' And I think that, you know, probably shouldn't take that as, as a compliment. But I think the spirit of it is, wow, you know, you guys really listen and you're here to to help, you know, lift up good ideas and collaborate and, you know, bring people together. And we really, really have, we try to bring a service mindset to everything. So I think the approach that I've always tried to, you know, espouse on the team as this is public data, and it belongs to everyone. You know, this is the public's data. It's, it's all of ours. And so if it's just, you know, sitting somewhere and no one's using it, that's a problem. Because this has critical information that can really help communities and help people's lives. So we're public servants who are here to do that. And if we can do it in fun and creative ways that actually make people smile and feel engaged, then that's really great, too. So I think all of that, you know, as part of our organizational culture has really helped us to thrive.

Sarah Agopian  20:21  
Yeah, it sounds like a win win. And, Linda, for you, um, you know, Drew has talked about this, like user advocate, like, this is kind of the foundation This is when you're going through these friends, this is what you do, like when you first start, can you just talk more about that experience, and how it really, I guess, shape your overall experience with the Sprint, but actually, like developing the product and things like that?

Linda Ge  20:46  
Yeah, sure. So we just to repeat, we really enjoyed participating in the sprint last year. And we actually launched our product at Demo Day, last December. So I can give an example of that through the problem statement on catalyzing investments in opportunity zones, which was sponsored by the US Department of Housing and Urban Development. We learned about how investors needed more transparency on just the number and the makeup of eligible opportunity sounds. And they also wanted to know how the new investment incentive program worked. So we spent much time talking to user advocates to really understand their needs. And it was also valuable to us to talk to data stewards, who maintained the datasets that we used. What's great about The Opportunity Project is really the emphasis on human centered design. So we were encouraged to reflect on needs rather than just diving straight into a solution. And that's influenced us so much that we're participating again in this year sprint. And we're focusing on the topic of sustainable Rural Economic Development, which is a problem statement posed by the Environmental Protection Agency. The Opportunity Project has been a great partner to us as we continue to scale our platform, from just opportunity zones to all census trucks in the United States.

Sarah Agopian  22:00  
Well, it looks like drew was very excited to have you guys participating again. So I love that, so good. And Pascal, I know that I have another question for you but you know, I think what's so core to Bayes project or praise project is citizen LED. And so since we're kind of on that topic, like why is that so important core to what you guys are doing, and then we'd love to just hear you know, about your partnership with the government and why it's actually so vital to what you're doing as well.

Pascal Corpet  22:34  
Yeah, it's actually quite kind of linked in our my approach, when I was when I first left, my previous job was trying to say, Hey, there are things I can fix, I could fix. And I'm just a citizen. And as Drew said, sometimes the state is not listening. And but with with new technologies with big data, I should be able to do something. So we start trying to look at data get access. And when we started a friend state at least was not at the point where it was a lab like Drew is working on and we were more, we can have cut a deal with them where they gave access to their internal data. So, we will talk a bit more later about the privacy side of things. But they were okay for us to investigate what was what they had. And we help them design, what should be useful for others to use as open data. And we help them create datasets and decide what to do. And because we were trying to see what would be useful for Bob, they basically took it, put it outside as open data, like it was way easier for them to understand because they had a use case. And we were to use case. And now that that is done, we actually don't have any specific channel to get specific data, we just use open data that we have created. And I think this is very powerful, and that power estates needs use case outside of them to to understand what they have to take out. And it is done people outside need some kind of access, and just relying on what's been open is usually a very small portion of it. And I'm not sure how much it is in the US because I don't know less about the legislation there. I know in France, any data that is collected by the states is supposed to be accessible by citizens. And theoretically, it's great. But in practice, the state doesn't know how to make it available and don't know what what to take out. So we need to start this trust and company like mine is a good way of doing that where we can start doing that. And then once we have open away and just make sure everybody can can use it the same way.

Sarah Agopian  24:38  
Yeah, I feel like you teed me up perfectly for the next question, which is how do we not only make data more accessible to citizens, but actually useful to them? This is a question for whoever on so whoever wants to jump in, feel free.

Pascal Corpet  24:54  
Yes, this is exactly what my project is about. Because basically when we decided and we find out what data the state should put out. And it was mostly this idea of competition versus like the stress of a market, which is a offers versus demand. And this is our global statistics. And they actually created a website of their own, where you can see all the other statistics. But for a user, who doesn't know exactly how to navigate what it means, or too many statistics is useless, and creating a product on top of that, and making sure we're as Linda was saying, having the need in mind as a user in mind and user centric approach is totally different. Of course, the state could do it, but their job they did it by providing the data. And the next step is exactly making it useful, not for people, but for use cases. Because those use cases will then help people if you try to say, I'm going to help people in general, it's so difficult.

Sarah Agopian  25:53  
Yeah. Drew, I would love to get your thoughts. And then Linda, I want to talk more about the investment side and kind of what that piece looks like.

Drew Zachary  26:02  
Yeah, so absolutely. Everything that Pascal spoke about resonates. And like Linda said, we really take that human centered design approach and everything that we do, that's sort of our, you know, on my team, a strong background that we have. And we've actually created a data curation product that is kind of in very early stages. But we found that so to kind of answer your question, I think, our sense is that you need to find out what, you know, what is what is the user's workflow around looking for government data? And we found in asking that question, that it's a really long and annoying process of figuring out, you know, here's the problem I'm trying to solve. Where would I even find government data on that maybe I go over to data.gov which is an incredibly valuable repo, but it has 300,000 data sets and am I going to be able to find what I need, and maybe there's 8000 data sets that might be relevant, you know, it's a really big process. And that's even just to find the data that you're looking for, let alone to actually understand, you know, the format's and the codebook, and everything associated with that data set, and then to actually use it in a product or tool or research or whatever. So, we created a product to help with the discovery phase of that. So just purely find what you're looking for. And like I mentioned earlier in our sprints, you know, we'll kind of manually curate 50 to 60 datasets for that particular sprint. So for problems, we'll find maybe 10 or 12 data sets that are highly relevant, and will provide that human expertise. And I think Linda really highlighted well, that ultimately, you know, we can interview 100 developers, someone API's, some never want API's, they're like, just give me the raw data. And I'll do whatever I want to it myself. So everybody's different but what everybody really likes is to have a person that is an expert who's actually accessible to them that they can call her, Slack message and say, you know, here's my quick question, what's the answer and actually get a response. So we provide that. And I think that through that kind of access to human beings and providing the right kind of descriptions for a technology developer audience, we've been able to really improve the experience of working with and finding government data. And then we really take seriously collecting feedback from the from our participants and giving that back to federal agencies who manage data to say, this is what your users are telling you. And they often have no doorway into that. So they don't even know who their users are, let alone how to get feedback from them. So we're kind of trying to improve that. That cycle of communication.

Sarah Agopian  28:48  
Yeah, that totally makes sense. And Linda, so you're coming from the investment side. But obviously, people are coming to those sites and wanting answers or information. So how are you leveraging data to inspire investors when they're coming to your site? Like how did and we got a glimpse of it? What have you guys found what's really like great info to highlight things like that?

Linda Ge  29:13  
Yeah, so our research shows that most investors are very hesitant to invest in places that they're unfamiliar with. It's all about making transactions with people with entities that they know whether through their network, or through their research, so most advice investors first find us by researching specific qualified opportunity funds, actually, and then they start exploring geographies on the site and they start to get to know the census tracts in the cities. So by providing more information such as projects, prioritized by city government themselves when it's available, we want to create starting points to drive both conversations and investments. So there's ongoing work on our side to bridge perspectives and to create a shared common language between investors and those Leaders so that capital can benefit communities.

Sarah Agopian  30:04  
Yeah, totally. And Awesome. Well, our next question which Pascal, you kind of mentioned earlier is how do you balance developing tools that empower communities without violating their privacy, which is, again, I feel like in itself, we could probably have an hour long conversation.

And whoever wants to kick things off,

Pascal Corpet  30:29  
I'd be glad to talk more about that, like this is something that we, we've been facing a lot, because we go directly to contact to our final users who are actually kind of usually not the big friends of the company, or even the state. And they are, like kind of afraid to enter anything in our tool. And we totally understand that. And so what we've done with our tool is to make sure that it is accessible without entering too much too much details about who you, the person is like, we don't ask an email, if you don't want to enter an email, you don't have to you don't have to enter any birthdate or so much. We try to get enough data to help you. But we don't store it if we can, if we can avoid storing it. And this works mostly because the data we get is not the data from directly from our users. But the data that the government has collected somehow. And this is something where the data loop whereas the data is ingested and used by the government, which is already collecting the data and gave us anonymous, anonymously the data. And it's already statically, aggregated, so that makes the product can still work can work with us, not endangering the privacy of the of the final user. And this would only work this would not work. If we were directly and the only only person in the room, we need to stay to two to be accountable for making sure they get proper data, clean data. But still on the other side, they give it back in a way where it's not injuring anybody.

Linda Ge  32:05  
Yeah, to echo what Pascal just said, we're also a consumer of federal data. And we found that generally, publicly accessible and open federal agency data sets drill down to an appropriate level of granularity. So to give an example, during our involvement in The Opportunity Project last year, we also heard from data stewards that many economic indicators such as an employment development activity, they're aggregated at county or zip code level, to protect residents privacy, especially when areas are sparsely populated. For Citi Builder itself, we haven't had any concerns with privacy so far, whether it's by displaying or recording data, we validate all of our new features in a weekly meeting with our legal counsel. We don't have user accounts at this time, and we're subject to the same privacy standards as the rest of the bank.

Pascal Corpet  32:59  
Yeah, that makes me, remind me, something I wanted to say about earlier that to have this, this granularity that the state can provide data that is useful, but not an enduring citizen is very difficult for them as well. And here's something that we have, we have them. And at first, they open up their whole data set with just scrambling the names and only giving us a very small sample of what they had. Also, they were like strong legal agreement between us that we will not leak it out. And then on top of that, we analyzed it and decided, yeah, at this level, it would be useful. And from that they could build their publicly accessible data sets. And this is, this is the work where you still have to work, as I said before, both inside inside public government and outside and still, like the danger on privacy and data, final data is is so big, that you states need to be the one making sure that it doesn't happen, that data goes out there without being cleaned and aggregated at the right level. And we're happy that to hear it from Linda that federal government does that a lot and is doing that, even though they can still publish so many datasets, while not endangering privacy.

Drew Zachary  34:16  
Yeah, and I would just say I think

I just very much agree that we've found that there's so much that you can do with open data. And in our process, we really we get this question often. And we only work with and sort of curate open datasets. In some cases, it's been new. So agencies have opened up brand new open data for the purpose of an opportunity project sprint, but it's all publicly available. And I think at the time that we launched this process that was kind of odd because things like hackathons or you know, extended processes, like hackathons, we're kind of working with a development sandbox or some data sharing agreements and, you know, our approaches company to that extremely lightweight. We let everybody like it's just access to people really, you know, we curate datasets and we give access to end users and data experts and subject matter experts and other collaborators. But other than that everybody is working in their own development environment. And that kind of helps us to protect against privacy issues and data security. And we just let everyone handle that on their own. And so the benefit of that is, if there are companies, you know, for instance, who are working on a health problem statement, and they say, you know, I'm building a tool, where our users are going to be inputting their own health information, you know, we can say, that's fine, that's your business. And you know, they're able to work with that in their own environment. And we're just kind of providing access to additional data layers that can enhance that product. And so I think through a really lightweight, kind of radically lightweight approach, we've been able to really enable a lot of different use cases and, and not have to get into any of those sticky data privacy issues.

Pascal Corpet  36:02  
One of the unintended consequences of that is that sometimes aggregation is either too coarse or too fine grain, or that we've seen that. And when we're building Bob, and especially trying to go to other countries, we're working with Belgium, which is smaller than France, and some of the data set was aggregated to the level of saying is there less than 50 people searching for a job and that kind of occupation, then we don't show the data. And that means that for people who are trying to understand what they are in, we have no clue, we can just say, okay, there's not enough data. And this not enough data is very frustrating, both for us and for the user, because we know the data is behind it. But for privacy reasons we understand we cannot access it. And maybe the state could do something different or say find another way of helping them for now we are just stuck with, okay, we've limited ourselves to this open data set. And it's not enough.

Sarah Agopian  37:01  
Yeah, I can imagine that would be a problem on like rural areas and things like that.

Drew Zachary  37:07  
And I was just gonna say, Pascal, I think it's so interesting, just to think about how all of these issues vary, even by country, like, it sounds like some of the things you've pointed out in France. Or like I could imagine how a French consumer, especially when you think about the relationship to government is so different culturally, even compared to the US, let alone other places that are much more different. So even that question like how people think about privacy issues, and you know, government being involved in data, I'm sure is so different. You know, like, it's just not monolithic. When it comes to kind of citizens and governments. It's so interesting to think about.

Sarah Agopian  37:48  
Yeah, that actually, Pascal, did you want to jump in?

Pascal Corpet  37:52  
Yeah, I just want to say that Drew solution of connecting people to talk about it, instead of making it bots or API's that you can access like coli. Yeah, people would help understand what is different. And people, I'm not sure if you can do but if you can even connect developers to actual federation agency who are building the data. This is where it's worth the most when people get 'okay, this is why we don't have the data here.' And maybe we should try to even gathers data differently to make it useful differently. And without that injuring people privacy.

Sarah Agopian  38:26  
Yeah, for sure. That kind of leads me to my next question, which is, you know, I think that trust has been broken when we think about data, government and big tech, especially in the US, what steps need to be taken to rebuild and regain trust? How do we break the stigma of big data?

It's a big question.

Linda Ge  38:56  
I think, first, my opinion, I think it's important to provide transparency into the ways that we access data, how we analyze it, and how it's used to make decisions. So with the work that we're doing on Citi Builder, investors have transparency on the needs of the local communities. community members, in turn can also track actual development activity against the needs, as shown through our data. And our feedback loop with local leaders can allow us to build features that are helpful. So this is a I think this would say this would qualify for being a team effort between different stakeholders.

Sarah Agopian  39:32  
Yeah, that totally makes sense.

Drew Zachary  39:36  
This is kind of a wonky answer. But I would say if you're on this webinar, you probably are nerds just like us. So if people haven't read the federal data strategy, I think, you know, there's there's building the right infrastructure to be responsible with data from from government agencies, and then there's actually getting people to connect with that. So I think a lot of the human centered approaches that we've taken. Just listening to people inviting people to the table. Embodying open government within your agency. And if just like Pascal said, if you're building a developer platform, or if you're thinking about a new data set, open up your doors, or your virtual doors now and engage the public in that process, I think that goes such a long way. And then, of course, like meaningfully listening to people's feedback and rethinking, you know, rethinking your approach based on that is, so that's one side of it. But the other side is just, you know, how we institutionalize that culture into government. And I do think that the federal data strategy that was launched several years ago and has gone under, you know, a few iterations now does a really good job of laying that out. I don't think that public really consumes that much at all, because it's, you know, like a policy document about data, but it really calls on every federal agency to, you know, make responsible use of their data and address, you know, how its maintained, and how it's put forward to the public and what steps agencies are actively taking to engage with the public around their data and use it for positive good, and be responsive. And so there is all that in place. There is the groundwork for that kind of that kind of activity in place. And I think that that was a really positive stuff a few years ago.

Pascal Corpet  41:27  
Yeah, I think that there's a lot of work done already by the public sector, and even private sector actually too, to have some guidelines, some rules, some, even some more many more laws around the world to try to make sure the data is protected in all the good practices used. I think that the next step, it needs to come from another place, which is making sure like the common folks understand what is his data? What are we talking about? What What do you mean, when I'm going when you go on Google that they are stealing my data? For some people, it means that they are actually looking through my webcam and reading things, my home, this is not the case. And this is why this is so confusing for them when they say okay, there's this law now protects me but I still hear that I'm my privacy is invaded. And so my approach on that is that tools should surface no words of data they have in a way that makes sense for the final users. And we've tried that with Bob, it's very difficult to, to somebody to try to show to show data like to show actual data that's either the user gave us or that we found from somebody or someplace else and show to them, especially that to like for, for nerds, it's easy for scientists, it's easy to see data to see graphs, they're happy to see charts. Before a user that is looking for a job and not a very high level job they've not trying to understand what is your data policy doesn't mean anything, and you can try to break it down, it still doesn't mean anything. But what we've done in our onboarding, like very simply, when you start entering something, we directly reply to you and say, "Hey, compared with, we're comparing what you just gave us with data, and this is what we show you." So when we're asking you, okay, what salary Are you expecting for your job, instead of waiting for the end of saying, whoa, your salary is too big or too, too high, or too lowm, whatever. We directly show you here, here's a piece of data and the data is showing you that. And if you, we haven't done that yet, but I realized it could be useful to say, hey, what where does this data come from. You could dig and understand that it's not like looking at two people's home and looking at how much they earn. But getting it from a public data set that has been aggregated and is protecting people privacy. But the entry point is making sure that we show what is data, and what is the problem with the data at the level of the user where they are as a citizen directly.

Sarah Agopian  43:54  
Check it see is very challenging and complex, just because there are so many different people. And cool. And I know that we have about 15 minutes left, and we have one last section with a few questions. And then I want to dive into some of the questions that we've were submitted. So Pascal, you kind of touched on this. And this is again, what Bob is kind of, you know, created from but how can AI and machine learning have a social impact? Maybe in a more broader sense? And what do you hope to see more of in the future?

Pascal Corpet  44:29  
Yes, so based on data, like data can provide stuff like context, but what really makes a difference for me is that the bots that you put on top of it, and that make like make it's understandable and for me AI we use it more and more. And this is one of my pride is that I can help people in very different areas and I am I because using both data and algorithm I can adapt what I've thought before like what I decided on on on a whiteboard to be useful to so many different people too, because the AI will adapt what I thought first, of course, it needs to crunch data. For now we're using mostly open datasets coming from somewhere, some public places, but it could be useful to have also feedback loops and machine learning to make sure the tool is getting smarter. But basically, we're trying to, and this is my take on that is that AI and machine learning could try to help compensate for some of my problems of some form of problems that are people trying to help others and to empowers him to empower others to distributor, what they are doing way further. In general, AI it seems to be used a lot to maximize money or to push people to do things that they don't want to or else. But if what you're trying to do and what you do is good. And you don't know how to do it very far, you should try to use those tools. And it should be possible to use those tools to go further. Of course, as I said at the beginning, you need to also monitor what you're doing because using AI it's especially machine learning, it's very hard to understand what will be the downside, what Where will you be actually hurting people instead of helping them. But still, I believe this tool should be used more and more to empower people trying to do good.

Sarah Agopian  46:27  
Yeah, I am right there with you. When that how do you think more immersive technology like AR and VR can be used to drive engagement and overall impact as it relates to products like Citi Builder.

Linda Ge  46:41  
We've heard from our user interviews that investors and developers generally prefer to visit sites and they conduct business offline in person. And as you can imagine, this has been extremely difficult and shelter in place times, which can present an opportunity for augmented or virtual reality. However, I would say that exploring this path would depend on further validation on whether such an enhancement given the investment can stay relevant after the pandemic. And we thought that a recent New York Times interactive feature didn't very well actually, by taking the reader through a virtual Walk of Jackson Heights in Queens. And we can see the potential in stimulating proposed projects on the app so that socially conscious investors can directly solicit feedback from the community.

Sarah Agopian  47:27  
Yeah, that makes sense. And I could also see how it would be challenging if people don't actually have like, the hardware as far as like VR. So that is just another barrier. Drew, what comes to mind when thinking about the impacts of emerging technology and 5G, not just on your role, and what you guys are doing at the opportunity project, but even the Census Bureau?

Drew Zachary  47:52  
Great question, I think we are kind of very consciously at a kind of crossroads where, as we go forward, people have and this kind of goes back to the big data conversation from earlier to people have access to constant streams of information and constant sources of data, whether from the private sector or, you know, just any source. And so that kind of makes our role at the Census Bureau, as a leading statistical agency, kind of brings it into question of how do we continue to be the leading source of an most trusted source of data for the public, when there are so many other sources of data out there. And I actually was at a conference a few years ago, and had somebody from a tech company just say, 'oh, well, 10 years from now, we're just going to use drones to do the census.' And, you know, people think that and of course, there's more to it than that, but it kind of raises the question of in people's mind, how do I know what data I can trust? What data is the most accurate, and I think it's a really important moment for the Census Bureau to be forward thinking and how we collaborate with the developers of emerging and advanced technologies to create the best picture of the American public and the American economy that we can. And so of course, we are always going to have our constitutionally mandated mission to count the population every 10 years and everything else within our mission. But part of what my team does is help to think about how can we be creative? So even in something like we talked about rural areas, and the challenges of data aggregation and even data accuracy and sparsely populated areas, or getting a really fine tuned pulse on different economic dynamics, you know, how can we I think we always look to NASA as an excellent example of how they've partnered with SpaceX and kind of re envisioned how they collaborate with industry and technology builders outside of government. And I think that there's a great moment for this statistical community and government to think about that too and say, you know, we have our mission guide, you know, private sector technology companies have their mission, how do we work together to deliver the best information that we can to to Americans? And I don't think we have all the answers yet. But that seems to be the direction where there's a lot of promise.

Sarah Agopian  50:17  
Yeah.

Just so much to think about, yeah, that marriage, like, I feel like it's a happy marriage, like between tech bureau. And it would be crazy. I mean, I live in Brooklyn, and I saw the Census Bureau, and actually, my boyfriend, someone knocked on his door. And so it's just very interesting to think, oh, this could be maybe a drone, and maybe none, I definitely trust a human a little bit more. Maybe that's my own personal issues. And with just a few minutes left, we do have some questions for Pascal. And so I'm going to dive into those. And so one, what data sets are you using to perform analytics for Bob? You mentioned market conditions, which source in particular? And then there's a second question that all?

Pascal Corpet  51:07  
Yes, so I'm looking especially for those in France today. And this is the government actually collecting both job postings from all over France and trying to gather all them in one place, which is I could do as well as basically scraping website, but it's easier if the government does it for me. And this is this one is semi public. The other one is actually what is private is in France, you get an allowance for when you're unemployed, from the government. And for that, you need to go there and get registered as an employee job seeker. And this is where you give the data of what kind of job you're looking for. And this data aggregated, gives us an understanding of how many people are currently out there looking for a job. In addition to the ones who are looking for changing jobs, because both are important. And this is one of the main data sets we use, but it's the most powerful as well, because although it could seem like very, pretty obvious. Most people don't know about the competition. And it's relevant to my point, what I was saying, maybe something sometimes something simple is very, very powerful. And and we go further, we also dive into other other datasets. Another one is trying to see which how people get a job, like did they get a job by using their network, or by finding a job offer online? What what how they did it is something that we also use to orient users. But the main one is what we call the market score and market stress.

Sarah Agopian  52:40  
Great. And then the second question was from the same individual, which is actually interesting, because you guys have chosen to go actually the nonprofit route as far as business model. So also building for underserved communities or social impact projects, leads to companies which may not be initial initially profitable, or their funding opportunities for data for good initiatives. How do you fund this work? And then Linda, Drew, also, if you have anything to chime in there too, on feel free?

Pascal Corpet  53:12  
Yeah, this is a tough decision we made at the beginning. And we were happy we did it. Because we're still alive, we're still working. So it was worth it. And it makes it makes things much simpler when we work with government or with other partners, when we're clearly out there not for profit but for helping users. And for users as well, it's very clear that we're not trying to steal data to sell it back. Or to we don't even ask them money, we're just trying to help them as best we can. We were allowed to do that because some some foundation were able to fund us at different stages of our, our life as a company. And I believe that it was a case five or six years ago, where data for good was kind of novel. And there are a lot of funding opportunities. Nowadays, it's more difficult. And then hopefully, there's way more people in that field. So that's good. But also that means that for each each opportunity is there is a lot of company trying to address it. So I'm not sure it's stable enough to live so long. And we're trying to find a mixed model where we know we're providing something useful. We've seen it with users. And now we're trying to make it distributed mostly by governments at this point where we're trying to hand over and say we found we created a service that you were not able to create because it was you need to be adjusted to it but now it is created. You can take it back and maybe help us go on to our new adventure after that.

Sarah Agopian  54:40  
Great and I know we're a bit short on time, but Drew I wanted to ask this last question, which is do you think that the census could be utilized to get more information than the just the constitutionally mandated data, such as Have you experienced depression since the last census, I feel like the census could be a great way to gather useful data on the populace.

Drew Zachary  55:04  
Yeah, that's such an interesting question. So this Census Bureau outside of just that, once every 10 years count actually does over 130 additional censuses and surveys, and many of them, you know, relate to health topics, including depression. And so they're even things like, you know, the American housing survey or different surveys done by, like focused on health and health care, we actually go out and collect that data. So we do a lot related to that I think it's just not that public in terms of the decennial census, since we talked about, you know, different international issues. Adding a question to the once every 10 years census is like, the most highly scrutinized thing ever because in the United States, you know, we of course, use that information for political purposes, things related to political representation and power. And so it's very highly scrutinized. And there's an incredibly extensive multi year process for things to be added to it. However, you know, we have recently done things like the economic and household pulse surveys, which were brand new rapid surveys meant to capture information about the public, in the midst of the covid 19 pandemic, and that definitely related to all different kinds of kind of household experiences that people are having, during the pandemic. And so, yeah, we absolutely, you know, do things related to that, and like, are getting into, you know, more innovative ways to capture that information, and would love to hear more ideas from the public. So, as the open innovation team, our door is always open for ideas, and we're super open to collaborating on things like that.

Sarah Agopian  56:54
So good. Yeah, I feel like collaboration Drew, I was to sum up anything that you guys are doing it's collaboration, which is so powerful, as you know, we can all see. Before we end things with our closing remarks. And I know, I feel like you guys have each kind of touched on this. But if you were to sum up, what is your greatest hope for the future of how data is being used? What would you say? And then also, on a personal note, if there was one critical issue that you would love to be solved, and I would just love to hear what you say about that, too.

Linda Ge  57:30  
So for us, we hope to create a platform where investors can learn about geographies that need investments in such a way that they can see all of the places that suit their profile and their needs, and not only find information on the places that they're familiar with. And we think that many communities are underserved today, because information about them is just lost in the abundance of data available. And we wish to tell stories about the unique strengths and challenges of every place in the United States.

Sarah Agopian  58:01  
Great.

Pascal Corpet  58:02  
As I touched base on before, I really believe that data should be used more and more. And my dream is that it's used way more not to make the more rich and more powerful, even more rich, but try to bridge a gap that people that are already in difficult situation. They can use, they can have access to data or use data or have services that using data to help bridge this gap and be at least the same level of information, and maybe later on, make better situations thanks to data.

Sarah Agopian  58:35  
Yeah, and Drew, the last words.

Drew Zachary  58:39  
Um, no pressure.

You know, we've seen so many data tools, not just from our team, but like in the world, there's no shortage of data tools. So I think, you know, we need to get to a place where we can really measure the impact of any of these things on the problems that they aim to solve, whether that's related to economic well being or depression or like anything else that we've talked about today. It would be great if we can actually connect data going into tools, and actually moving the needle on any of those things. And I think part of that, you know, we talk a lot about data, democratization, and getting data into people's hands. But I think every everywhere globally, and of course, here in the US, like data literacy, and training people from the earliest education to understand data and not think it's scary, and I think technology is something that's not for them is a really big part of that.

Sarah Agopian  59:42  
Yeah, I could just see curriculum like changing right. Well, thank you guys so much to all of our speakers and also our, our audience for joining in. This conversation has been recorded and will be available tomorrow on our website. Also, we've got a whole lineup of conversations around Big Data this month next week we're chatting one on one with Suzanne Borders, who is the CEO and co founder of BadVr. For more information please visit us at verizon5glabs.com or alley@alley.com. Thanks everyone so much. And Pascal Linda, Drew, thank you for just sharing your insights. It was really an honor. Bye.

Read Next