Exploring Stream's Video SDK: Creating a WWDC Watch Party App

If you are an Apple Platforms developer, you know how exciting Apple's annual Worldwide Developers Conference (WWDC) is. For those of us who live and breathe code, WWDC is like Christmas, New Year, and our birthday all rolled into one.

The ultimate event for all things Apple, where they announce new products, features, and updates. Especially this WWDC 2024 where there are big rumours about Apple going all in on generative AI.

But as much as we love WWDC, it is not feasible to fly all the way to Cupertino, or win the golden ticket. Sure, we can watch the keynote and sessions on our own, but it is so much more fun to share the experience with friends and colleagues. That's where the idea for a WWDC Watch Party App comes in!

To build this app, I turned to Stream's Video SDK. Stream's Video SDK is a powerful tool that gives developers everything they need to build rich video capabilities into their apps. With Stream's Video SDK, you can use out-of-the-box UI components with powerful customisation and theming options, or build your own UI while reusing Stream's lower-level logic.

This post is sponsored by Stream, but all views expressed are my own.

Let's dive in and explore how to build a WWDC Watch Party App with Stream's Video SDK.

You can find the final project here: https://github.com/rudrankriyam/WWDC-Watch-Party

Setting Up the Project

Open Xcode and create a new project. Select "App" under the iOS tab and click "Next". Name your project "WWDCWatchParty" and choose a location to save it. Make sure the "Interface" is set to "SwiftUI". Click "Next" to proceed.

To integrate Stream's Video SDK, we will use Swift Package Manager. In Xcode, navigate to "File" > "Swift Packages" > "Add Package Dependency". In the search bar, enter "https://github.com/GetStream/stream-video-swift" and click "Add Package".

Now, let's select the packages we need. Check the boxes for "StreamVideo" and "StreamVideoSwiftUI" and click "Add Package".

To use the camera and microphone, we need to add the necessary permissions to our app's Info.plist file.

Open the Info.plist file and add the following keys and descriptions:

  • Privacy - Camera Usage Description: "Camera access is required to share your video during the WWDC watch party."
  • Privacy - Microphone Usage Description: "Microphone access is required to share your audio during the WWDC watch party."

Your Info.plist file should look like this:

<key>NSCameraUsageDescription</key>
<string>Camera access is required to share your video during the WWDC watch party.</string>
<key>NSMicrophoneUsageDescription</key>
<string>Microphone access is required to share your audio during the WWDC watch party.</string>

When the watch party app requests these permissions, the specified descriptions are displayed to the user, providing them with clarity on how the permissions will be used building trust and transparency between the app and the participants.

Implementing Basic WWDC Watch Party Functionality

A "session room" is a virtual space where users can join and watch WWDC sessions together in real-time.

Just like how each WWDC session has its own unique ID, you will want to create a distinct ID for the corresponding watch party room.

To model a session in your app, you can create a Session struct that encapsulates these properties. Here is how you define the Session struct:

struct Session: Identifiable {
  let id: Int
  let title: String
  let videoURL: URL
  let thumbnailURL: URL
}

To provide sample data for testing, you can create an extension on the [Session] array that defines a static property called sampleSessions. This property will hold an array of predefined Session instances:

extension [Session] {
  static let sampleSessions: Self = [
    Session(
      id: 101,
      title: "WWDC 2023 Keynote",
      videoURL: URL(string: "https://events-delivery.apple.com/0105cftwpxxsfrpdwklppzjhjocakrsk/m3u8/vod_index-PQsoJoECcKHTYzphNkXohHsQWACugmET.m3u8")!,
      thumbnailURL: URL(string: "https://i.ytimg.com/vi/GYkq9Rgoj8E/hq720.jpg?sqp=-oaymwEXCNAFEJQDSFryq4qpAwkIARUAAIhCGAE=&rs=AOn4CLAI0BvhdevIRJS89JD9YnXlbsh20w")!
    ),
    Session(
      id: 10072,
      title: "Principles of spatial design",
      videoURL: URL(string: "https://devstreaming-cdn.apple.com/videos/wwdc/2023/10072/5/C43DFF91-F057-43E1-891F-41E6D5C01716/cmaf.m3u8")!,
      thumbnailURL: URL(string: "https://i.ytimg.com/vi/Q1aJy1Hwjp8/hqdefault.jpg?sqp=-oaymwEjCNACELwBSFryq4qpAxUIARUAAAAAGAElAADIQj0AgKJDeAE=&rs=AOn4CLBydTO2C6nUmnN4X_zC66dH8QW2FQ")!
    ),
    Session(
      id: 10109,
      title: "Meet SwiftUI for spatial computing",
      videoURL: URL(string: "https://devstreaming-cdn.apple.com/videos/wwdc/2023/10109/4/F4A066BD-28D9-4CF8-AAF3-D35EA776504F/cmaf.m3u8")!,
      thumbnailURL: URL(string: "https://i.ytimg.com/vi/HEE8qjXQYPc/hqdefault.jpg?sqp=-oaymwEjCNACELwBSFryq4qpAxUIARUAAAAAGAElAADIQj0AgKJDeAE=&rs=AOn4CLClaHmeXbjjIGD7Q5qazy1516u4PA")!
    )
  ]
}

In this example, three sample sessions are defined: the WWDC 2023 Keynote, Principles of spatial design, and Meet SwiftUI for spatial computing. Each session is initialized with its corresponding id, title, videoURL, and thumbnailURL.

Now that you have a model for representing sessions and sample data to work with, you can create a SessionsView to display the list of available WWDC sessions :

import SwiftUI 

struct SessionsView: View {
  let sessions: [Session]

  var body: some View {
    NavigationStack {
      ScrollView {
        LazyVStack {
          ForEach(sessions) { session in
            NavigationLink(destination: SessionDetailView(session: session)) {
              VStack {
                AsyncImage(url: session.thumbnailURL) { image in
                  image
                    .resizable()
                    .aspectRatio(contentMode: .fit)
                    .cornerRadius(12)
                } placeholder: {
                  ProgressView()
                }

                Text(session.title)
                  .font(.headline)
                  .multilineTextAlignment(.center)
              }
            }
            .buttonStyle(.plain)
          }
        }
        .padding()
      }
      .navigationTitle("WWDC Sessions")
    }
  }
}

#Preview("SessionsView") {
  SessionsView(sessions: .sampleSessions)
}

This view uses a LazyVGrid to display the sessions in a grid format, with each session showing a play icon and its title. Tapping on a session navigates to the SessionDetailView for that specific session.

Playing a WWDC Session Video

You will need to fetch the video URL of the desired WWDC session. You can achieve this by maintaining a predefined list of session URLs within your app.

Once you have the video URL, you utilize the AVPlayer class from the AVFoundation framework to handle video playback. Here's a breakdown of the code:

private func setupAVPlayer() {
    let asset = AVAsset(url: session.videoURL)
    let playerItem = AVPlayerItem(asset: asset)
    player = AVPlayer(playerItem: playerItem)
}
  1. Create an instance of AVAsset using the session's video URL. This represents the media asset you want to play.
  2. Create an AVPlayerItem from the AVAsset. The player item encapsulates the asset and provides additional control over the playback.
  3. Initialize the AVPlayer with the AVPlayerItem. The AVPlayer is responsible for managing the playback of the video.

To ensure a smooth audio experience while we have the microphone enable, it is crucial to configure the audio session properly. The setupAudioSession() function handles this task:

private func setupAudioSession() {
    let audioSession = AVAudioSession.sharedInstance()
    do {
        try audioSession.setCategory(.playAndRecord, mode: .default, options: [.defaultToSpeaker, .mixWithOthers])
        try audioSession.setActive(true)
    } catch {
        print("Failed to set up audio session: \(error)")
    }
}
  1. Obtain the shared instance of AVAudioSession.
  2. Set the audio session category to .playAndRecord and the mode to .default. This allows your app to play audio and record audio simultaneously.
  3. Enable the .defaultToSpeaker option to route the audio output to the device's speaker by default.
  4. Activate the audio session to apply the configuration.

In the body of your SwiftUI view, you can display the video player using the VideoPlayer view, passing the AVPlayer instance as a parameter:

@State private var player: AVPlayer?

var body: some View {
    VStack {
        if let player {
            VideoPlayer(player: player)
                .frame(height: 300)
        }
    }
    .navigationTitle(session.title)
    .onAppear {
        setupAudioSession()
        setupAVPlayer()
    }
}

The VideoPlayer is a SwiftUI view that plays a video using an AVPlayer instance. The AVPlayer is a powerful class from the AVFoundation framework that allows you to play, pause, seek, and control the playback of media files.

To use the VideoPlayer, you need to provide it with an AVPlayer instance. In this code, the player state variable holds the AVPlayer instance. By using the @State property wrapper, SwiftUI automatically updates the view whenever the player variable changes.

Inside the body of the view, the VideoPlayer is conditionally displayed based on whether the player variable is not nil. This ensures that the video player is only shown when a valid AVPlayer instance is available.

Getting the Calling Feature Working

With the video playing functionality in place, let us now focus on adding an interactive element to our watch party app: the calling feature.

To implement the calling feature, we use Stream's Video iOS SDK. Stream provides some excellent APIs that simplify the process of building in-app video calling functionality.

The first step is to initialize the StreamVideo client with the necessary configuration. The StreamVideo class serves as the main entry point for interacting with the SDK. To create an instance of StreamVideo, you need to provide a valid API key, user, and token.

Here's the general format for initializing the StreamVideo client:

let client = StreamVideo(
  apiKey: "your_api_key",
  user: .guest("user_id"),
  token: .init(stringLiteral: "your_token")
)

Replace "your_api_key" with your actual API key, "user_id" with a unique identifier for the user, and "your_token" with a valid token for authentication.

For testing purposes, you can use the following sample API key and token:

let client = StreamVideo(
  apiKey: "mmhfdzb5evj2",
  user: .guest("Brakiss"),
  token: .init(stringLiteral: "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJ1c2VyX2lkIjoiQnJha2lzcyIsImlzcyI6Imh0dHBzOi8vcHJvbnRvLmdldHN0cmVhbS5pbyIsInN1YiI6InVzZXIvQnJha2lzcyIsImlhdCI6MTcxMzcwNDgwMCwiZXhwIjoxNzE0MzA5NjA1fQ.tbRfCYcuFh2kTc-ZRFh9GPxEgnoQgfyJ_zheOuzDpIM")
)

After initializing the StreamVideo client, we create a call using the Stream Video SDK. The Call class is an observable object that provides information about the call state and offers methods for updating it.

To create a call, you use the call(callType:callId:) method provided by the StreamVideo client. Here's an example of how to create a call:

let call = client.call(callType: "default", callId: "session_\(session.id)")

You invoke the call(callType:callId:) method on the client instance, which represents our initialized StreamVideo client.

The callType parameter specifies the type of call you want to create. In this example, we set it to "default", indicating a standard video call. You can customize the call type based on your app's requirements and any additional features you want to incorporate.

The callId parameter uniquely identifies the call within your app. It allows you to differentiate between multiple calls and ensures that participants can join the correct call. In the provided code, we generate a unique call ID by concatenating the prefix "session_" with the session.id property.

Joining a Call

Once you have created a Call instance, the next step is to establish a connection between the participants for the session.

To join a call, you will use the join(create:) method provided by the Call class. This method takes a boolean parameter create, which indicates whether the call should be created if it does not already exist.

Here is an example of how to join a call using the join(create:) method:

@State private var callCreated = false

Task {
  guard !callCreated else { return }
  try await call.join(create: true)
  callCreated = true
}

The callCreated flag is used to prevent multiple join attempts. It ensures that the join(create:) method is only called once per call instance. This is important because attempting to join a call multiple times can lead to unexpected behavior and waste resources.

Displaying Participants

Watching a WWDC session together is more engaging and interactive when participants in the watch party can see each other! We will create views for both the local participant (the current user) and remote participants (other users in the call):

struct ParticipantsView: View {
  var call: Call
  var participants: [CallParticipant]
  var onChangeTrackVisibility: (CallParticipant?, Bool) -> Void
  var size: CGSize = .init(width: 150, height: 150)

  var body: some View {
    VStack {
      Spacer()
      if !participants.isEmpty {
        ScrollView(.horizontal) {
          HStack {
            ForEach(participants) { participant in
              VideoRendererView(id: participant.id, size: size) { videoRenderer in
                videoRenderer.handleViewRendering(for: participant, onTrackSizeUpdate: { _,_ in })
              }
              .frame(width: size.width, height: size.height)
              .clipShape(RoundedRectangle(cornerRadius: 8))
              .onAppear { onChangeTrackVisibility(participant, true) }
              .onDisappear { onChangeTrackVisibility(participant, false) }
            }
          }
        }
      } else {
        Color.black
      }
    }
  }
}

The ParticipantsView takes three main parameters:

  1. call: An instance of the Call class representing the current call.
  2. participants: An array of CallParticipant objects representing the participants in the call.
  3. onChangeTrackVisibility: A closure that is called when a participant's track visibility changes, allowing you to handle the visibility state when there are many views to reduce usage of resources.

For each participant, a VideoRendererView is created. The VideoRendererView is a view provided by the StreamVideoSwiftUI framework that handles the rendering of the participant's video track. Inside the closure, you call videoRenderer.handleViewRendering(for:onTrackSizeUpdate:) to handle the rendering of the participant's video track.

To display the participants in your app, you can create an instance of ParticipantsView and provide the necessary parameters:

if let localParticipant = call.state.localParticipant {
  ParticipantsView(
    call: call,
    participants: [localParticipant] + call.state.remoteParticipants,
    onChangeTrackVisibility: changeTrackVisibility(_:isVisible:)
  )
}

We first check if the local participant is available by accessing call.state.localParticipant. If the local participant exists, we create an instance of ParticipantsView and pass the call, participants (combining the local participant and remote participants), and onChangeTrackVisibility closure.

Synchronising Playback with Custom Events

To ensure that the audio and video are in sync for all participants in the watch party, we will need to send custom events from the device that started the session. These events will contain the current playback position, which the other participants can use to keep their players in sync.

We send custom events using the  sendCustomEvent(_:) function of the Call class to the other participants in the call:

private func sendPlaybackPositionEvent() {
  guard let player = player else { return }
  Task {
    let customEventData: [String: RawJSON] = [
      "type": .string("playbackPosition"),
      "position": .number(player.currentTime().seconds)
    ]
    do {
      let response = try await call.sendCustomEvent(customEventData)
      print("SUCCESS SENT RESPONSE", response)
    } catch {
      print("Error sending custom event: \(error)")
    }
  }
}

We then create a syncTimer that fires every second while the video is playing. In the timer callback, we will send a custom event with the current playback position:

@State private var syncTimer: Timer?

VideoPlayer(player: player)
  .frame(height: 300)
  .onReceive(player.publisher(for: \.timeControlStatus)) { status in
    if status == .playing {
      syncTimer = Timer.scheduledTimer(withTimeInterval: 1.0, repeats: true) { _ in
        sendPlaybackPositionEvent()
      }
    } else {
      syncTimer?.invalidate()
      syncTimer = nil
    }
  }

Then, we subscribe to custom events to receive updates about the playback position from the main participant and sync your own playback accordingly.

Let us take a closer look at the subscribeToCustomEvents() function that is called after creating a call:

private func subscribeToCustomEvents() {
  Task {
    do {
      for await event in call.subscribe(for: CustomVideoEvent.self) {
        handleCustomEvent(event)
      }
    } catch {
      print("Error subscribing to custom events: \(error)")
    }
  }
}

It uses the subscribe(for:) method provided by the Call class to subscribe specifically to events of type CustomVideoEvent.

Inside a Task block, the function uses a for await loop to asynchronously iterate over the custom events. Each event received is passed to the handleCustomEvent(_:) function for processing. When a custom event is received, it needs to be handled appropriately to sync the playback position:

private func handleCustomEvent(_ event: CustomVideoEvent) {
  if let position = event.custom["position"]?.numberValue {
    syncPlaybackPosition(to: position)
  }
}

It assumes that the custom event contains a "position" key with a numeric value representing the current playback position. If the playback position is successfully extracted, it is passed to the syncPlaybackPosition(to:) function to sync the local playback to the received position:

private func syncPlaybackPosition(to position: Double) {
  let time = CMTime(seconds: position, preferredTimescale: 1000)
  player?.seek(to: time)
}

Using the seek(to:) method of the AVPlayer instance, the local playback is updated to the specified time. This ensures that all participants are in sync and watching the same part of the video.

Conclusion

Stream's SDK makes it easy for developers to create engaging and interactive experiences for their users. With features like live streaming, real-time video playback, and the ability to add custom overlays, Stream's Video SDK provides a solid foundation for building a wide range of video-based apps.

If you are interested in exploring Stream's Video SDK further, I encourage you to check out the official documentation and resources. There is also a blog post by Stream on creating an iOS live-streaming app with SwiftUI.

In the upcoming parts of this blog post series, we will be exploring how to enhance the WWDC watch party experience and integrate chat for discussion during sessions. We will also be diving deeper into Stream's Video SDK and exploring some of its more advanced features.