Etiqueta: Speech

  • Seamless Text Input with Your Voice on iOS

    Seamless Text Input with Your Voice on iOS

    Most likely, you have faced a situation where you’re enjoying the seamless flow of an application—for instance, while making a train or hotel reservation. Then, suddenly—bam!—a never-ending form appears, disrupting the experience. I’m not saying that filling out such forms is irrelevant for the business—quite the opposite. However, as an app owner, you may notice in your analytics a significant drop in user conversions at this stage.

    In this post, I want to introduce a more seamless and user-friendly text input option to improve the experience of filling out multiple fields in a form.

    Base project

    To help you understand this topic better, we’ll start with a video presentation. Next, we’ll analyze the key parts of the code. You can also download the complete code from the repository linked below.

    To begin entering text, long-press the desired text field. When the bottom line turns orange, it indicates that the has been activated speech-to-text mode. Release your finger once you see the text correctly transcribed. If the transcribed text is correct, the line will turn green; otherwise, it will turn red.

    Let’s dig in the code…

    The view is built with a language picker, which is a crucial feature. It allows you to select the language you will use later, especially when interacting with a form containing multiple text fields.

    struct VoiceRecorderView: View {
       @StateObject private var localeManager = appSingletons.localeManager
        @State var name: String = ""
        @State var surename: String = ""
        @State var age: String = ""
        @State var email: String = ""
        var body: some View {
            Form {
                Section {
                    Picker("Select language", selection: $localeManager.localeIdentifier) {
                        ForEach(localeManager.locales, id: \.self) { Text($0).tag($0) }
                    }
                    .pickerStyle(SegmentedPickerStyle())
                    .onChange(of: localeManager.localeIdentifier) {
                    }
                }
    
                Section {
                    TextFieldView(textInputValue: $name,
                                  placeholder: "Name:",
                                  invalidFormatMessage: "Text must be greater than 6 characters!") { textInputValue in
                        textInputValue.count > 6
                    }
                    
                    TextFieldView(textInputValue: $surename,
                                  placeholder: "Surename:",
                                  invalidFormatMessage: "Text must be greater than 6 characters!") { textInputValue in
                        textInputValue.count > 6
                    }
                    TextFieldView(textInputValue: $age,
                                  placeholder: "Age:",
                                  invalidFormatMessage: "Age must be between 18 and 65") { textInputValue in
                        if let number = Int(textInputValue) {
                            return number >= 18 && number <= 65
                        }
                        return false
                    }
                }
                
                Section {
                    TextFieldView(textInputValue: $email,
                                  placeholder: "Email:",
                                  invalidFormatMessage: "Must be a valid email address") { textInputValue in
                        let emailRegex = #"^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}$"#
                        let emailPredicate = NSPredicate(format: "SELF MATCHES %@", emailRegex)
                        return emailPredicate.evaluate(with: textInputValue)
                    }
                }   
            }
            .padding()
        }
    }

    For every text field, we need a binding variable to hold the text field’s value, a placeholder for guidance, and an error message to display when the acceptance criteria function is not satisfied.

    When we examine the TextFieldView, we see that it is essentially a text field enhanced with additional features to improve user-friendliness.

    struct TextFieldView: View {
        
        @State private var isPressed = false
        
        @State private var borderColor = Color.gray
        @StateObject private var localeManager = appSingletons.localeManager
    
        @Binding var textInputValue: String
        let placeholder: String
        let invalidFormatMessage: String?
        var isValid: (String) -> Bool = { _ in true }
        
        var body: some View {
            VStack(alignment: .leading) {
                if !textInputValue.isEmpty {
                    Text(placeholder)
                        .font(.caption)
                }
                TextField(placeholder, text: $textInputValue)
                    .accessibleTextField(text: $textInputValue, isPressed: $isPressed)
                    .overlay(
                        Rectangle()
                            .frame(height: 2)
                            .foregroundColor(borderColor),
                        alignment: .bottom
                    )
                .onChange(of: textInputValue) { oldValue, newValue in
                        borderColor = getColor(text: newValue, isPressed: isPressed )
                }
                .onChange(of: isPressed) {
                        borderColor = getColor(text: textInputValue, isPressed: isPressed )
                }
                if !textInputValue.isEmpty,
                   !isValid(textInputValue),
                    let invalidFormatMessage {
                    Text(invalidFormatMessage)
                        .foregroundColor(Color.red)
                }
            }
        }
        
        func getColor(text: String, isPressed: Bool) -> Color {
            guard !isPressed else { return Color.orange }
            guard !text.isEmpty else { return Color.gray }
            return isValid(text) ? Color.green : Color.red
        }
        
    }

    The key point in the above code is the modifier .accessibleTextField, where all the magic of converting voice to text happens. We have encapsulated all speech-to-text functionality within this modifier.

    extension View {
        func accessibleTextField(text: Binding<String>, isPressed: Binding<Bool>) -> some View {
            self.modifier(AccessibleTextField(text: text, isPressed: isPressed))
        }
    }
    
    struct AccessibleTextField: ViewModifier {
        @StateObject private var viewModel = VoiceRecorderViewModel()
        
        @Binding var text: String
        @Binding var isPressed: Bool
        private let lock = NSLock()
        func body(content: Content) -> some View {
            content
                .onChange(of: viewModel.transcribedText) {
                    guard viewModel.transcribedText != "" else { return }
                    self.text = viewModel.transcribedText
                }
                .simultaneousGesture(
                    DragGesture(minimumDistance: 0)
                        .onChanged { _ in
                            lock.withLock {
                                if !isPressed {
                                    isPressed = true
                                    viewModel.startRecording(locale: appSingletons.localeManager.getCurrentLocale())
                                }
                            }
                            
                        }
                        .onEnded { _ in
                            
                            if isPressed {
                                lock.withLock {
                                    isPressed = false
                                    viewModel.stopRecording()
                                }
                            }
                        }
                )
        }
    }

    The voice-to-text functionality is implemented in the VoiceRecorderViewModel. In the view, it is controlled by detecting a long press from the user to start recording and releasing to stop the recording. The transcribed voice text is then forwarded upward via the text Binding attribute.

    Finally, here is the view model that handles the transcription:

    import Foundation
    import AVFoundation
    import Speech
    
    class VoiceRecorderViewModel: ObservableObject {
        @Published var transcribedText: String = ""
        @Published var isRecording: Bool = false
        
        private var audioRecorder: AVAudioRecorder?
        private let audioSession = AVAudioSession.sharedInstance()
        private let recognitionRequest = SFSpeechAudioBufferRecognitionRequest()
        private var recognitionTask: SFSpeechRecognitionTask?
        private var audioEngine = AVAudioEngine()
        
        var speechRecognizer: SFSpeechRecognizer?
    
        func startRecording(locale: Locale) {
            do {
                self.speechRecognizer = SFSpeechRecognizer(locale: locale)
    
                recognitionTask?.cancel()
                recognitionTask = nil
    
                try audioSession.setCategory(.record, mode: .measurement, options: .duckOthers)
                try audioSession.setActive(true, options: .notifyOthersOnDeactivation)
    
                guard let recognizer = speechRecognizer, recognizer.isAvailable else {
                    transcribedText = "Reconocimiento de voz no disponible para el idioma seleccionado."
                    return
                }
                
                let inputNode = audioEngine.inputNode
                let recordingFormat = inputNode.outputFormat(forBus: 0)
                inputNode.installTap(onBus: 0, bufferSize: 1024, format: recordingFormat) { buffer, when in
                    self.recognitionRequest.append(buffer)
                }
                
                audioEngine.prepare()
                try audioEngine.start()
                
                recognitionTask = recognizer.recognitionTask(with: recognitionRequest) { result, error in
                    if let result = result {
                        self.transcribedText = result.bestTranscription.formattedString
                    }
                }
                
                isRecording = true
            } catch {
                transcribedText = "Error al iniciar la grabación: \(error.localizedDescription)"
            }
        }
        
        func stopRecording() {
            audioEngine.stop()
            audioEngine.inputNode.removeTap(onBus: 0)
            recognitionRequest.endAudio()
            recognitionTask?.cancel()
            isRecording = false
        }
    }

    Key Components

    1. Properties:

      • @Published var transcribedText: Holds the real-time transcribed text, allowing SwiftUI views to bind and update dynamically.
      • @Published var isRecording: Indicates whether the application is currently recording.
      • audioRecorder, audioSession, recognitionRequest, recognitionTask, audioEngine, speechRecognizer: These manage audio recording and speech recognition.
    2. Speech Recognition Workflow:

      • SFSpeechRecognizer: Recognizes and transcribes speech from audio input for a specified locale.
      • SFSpeechAudioBufferRecognitionRequest: Provides an audio buffer for speech recognition tasks.
      • AVAudioEngine: Captures microphone input.

    Conclusions

    I aim you that you download the project  from following github repositoryand start to play with such great techology.

    References

    • Speech

      Apple Developer Documentation